e2e CI Job #259

danehans · 2025-01-30T04:37:45Z

Create a pre-submit CI job that runs e2e test.

Does the project create a fake model server or use a model that does not require a signed license agreement (xref). @liu-cong suggested using Qwen here but I have yet to test vLLM with LoRA support.
How does the project acquire at least 3 GPU resources?
Will the job run on BM/VM cluster or a single VM with 3 GPUs and use kind?

danehans · 2025-01-30T04:49:28Z

@ahg-g @kfswain IMO this should be required for v0.1.

ahg-g · 2025-01-30T05:03:59Z

I don't think it is required as long as we can run the test manually after each commit.

btw, this requires #244

Jeffwan · 2025-01-30T18:20:06Z

I can help work on the vLLM CPU version. I will give some update soon

joerunde · 2025-01-30T18:33:22Z

I see that there was intent for vllm to support a vllm-cpu release on dockerhub, but I can't seem to find it
vllm-project/vllm#11261

danehans · 2025-01-30T22:47:26Z

I don't think it is required as long as we can run the test manually after each commit.

Okay but manual testing post commit is a bit risky. Adding the CI job should be a high priority post v0.1.

Jeffwan · 2025-02-05T01:14:15Z

I build one, publish it in my personal dockerhub and run it successfully.

The only thing worth attention is the image size is still ~9GB which could introduce No space left issue on CI compute node.

root@129-213-130-206:/home/ubuntu# docker images
REPOSITORY     TAG       IMAGE ID       CREATED          SIZE
vllm-cpu-env   latest    8ddb28e44d97   10 minutes ago   9.19GB

Scripts I used

docker run -it --rm --network=host \
  --cpuset-cpus="0-15" --cpuset-mems="0" \
  -v /home/ubuntu/.cache/huggingface:/root/.cache/huggingface \
  seedjeffwan/vllm-cpu-env:bb392af4-20250203 --model Qwen/Qwen2.5-1.5B-Instruct \
  --enable-lora \
  --lora-modules=lora1=/root/.cache/huggingface/hub/models--ai-blond--Qwen-Qwen2.5-Coder-1.5B-Instruct-lora/snapshots/9cde18d8ed964b0519fb481cca6acd936b2ca811

Following code snippet should be working as well.

docker run -it --rm \
  seedjeffwan/vllm-cpu-env:bb392af4-20250203 --model Qwen/Qwen2.5-1.5B-Instruct \
  --enable-lora \
  --lora-modules=lora1=ai-blond/Qwen-Qwen2.5-Coder-1.5B-Instruct

danehans · 2025-02-06T22:29:00Z

@Jeffwan I can reproduce #259 (comment) but I can't run the qwen model. See this gist for details.

Jeffwan · 2025-02-07T08:25:56Z

@danehans Let me check the gist and see if there's anything I can help with

nirrozenbaum · 2025-02-26T12:12:59Z

@Jeffwan I also tried running it, in my env it fails to run even in Docker (also fails in an Openshift cluster).
would it be possible to set a short meeting a go through your setup?

nirrozenbaum · 2025-02-26T17:57:26Z

@Jeffwan @danehans I was able to make a lot of progress and run the cpu based in Openshift.
I need to continue but the overall direction looks good.
btw, in Openshift cluster Vs k8s there are some minor differences in default permissions when running pods and therefore I have no permission to some filepath, but I'll fix that in my deployment and hope to get it up and running e2e by tomorrow's weekly.

kubectl -n nirro logs -f vllm-cpu-test-5f5889858c-jcz6k
Defaulted container "lora" out of: lora, adapter-loader (init)
INFO 02-26 17:50:13 __init__.py:186] Automatically detected platform cpu.
INFO 02-26 17:50:13 api_server.py:840] vLLM API server version 0.7.2.dev26+gbb392af4
INFO 02-26 17:50:13 api_server.py:841] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=[LoRAModulePath(name='lora1', path='/adapters/hub/models--ai-blond--Qwen-Qwen2.5-Coder-1.5B-Instruct-lora/snapshots/9cde18d8ed964b0519fb481cca6acd936b2ca811', base_model_name=None)], prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, enable_reasoning=False, reasoning_parser=None, tool_call_parser=None, tool_parser_plugin='', model='Qwen/Qwen2.5-1.5B-Instruct', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=True, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False)
INFO 02-26 17:50:13 api_server.py:206] Started engine process with PID 73
INFO 02-26 17:50:18 __init__.py:186] Automatically detected platform cpu.
INFO 02-26 17:50:20 config.py:542] This model supports multiple tasks: {'score', 'classify', 'embed', 'generate', 'reward'}. Defaulting to 'generate'.
WARNING 02-26 17:50:20 config.py:678] Async output processing is not supported on the current platform type cpu.
WARNING 02-26 17:50:20 _logger.py:72] CUDA graph is not supported on CPU, fallback to the eager mode.
WARNING 02-26 17:50:20 _logger.py:72] Environment variable VLLM_CPU_KVCACHE_SPACE (GB) for CPU backend is not set, using 4 by default.
INFO 02-26 17:50:25 config.py:542] This model supports multiple tasks: {'score', 'classify', 'reward', 'embed', 'generate'}. Defaulting to 'generate'.
WARNING 02-26 17:50:25 config.py:678] Async output processing is not supported on the current platform type cpu.
WARNING 02-26 17:50:25 _logger.py:72] CUDA graph is not supported on CPU, fallback to the eager mode.
WARNING 02-26 17:50:25 _logger.py:72] Environment variable VLLM_CPU_KVCACHE_SPACE (GB) for CPU backend is not set, using 4 by default.
INFO 02-26 17:50:25 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.2.dev26+gbb392af4) with config: model='Qwen/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cpu, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=Qwen/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=False, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, 
INFO 02-26 17:50:25 cpu.py:39] Cannot use None backend on CPU.
INFO 02-26 17:50:25 cpu.py:40] Using Torch SDPA backend.
INFO 02-26 17:50:25 importing.py:16] Triton not installed or not compatible; certain GPU-related functions will not be available.
INFO 02-26 17:50:26 weight_utils.py:252] Using model weights format ['*.safetensors']
INFO 02-26 17:51:39 weight_utils.py:297] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  3.70it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00,  3.70it/s]

INFO 02-26 17:51:40 punica_selector.py:18] Using PunicaWrapperCPU.
INFO 02-26 17:51:40 executor_base.py:110] # CPU blocks: 9362, # CPU blocks: 0
INFO 02-26 17:51:40 executor_base.py:115] Maximum concurrency for 32768 tokens per request: 4.57x
INFO 02-26 17:51:40 llm_engine.py:431] init engine (profile, create kv cache, warmup model) took 0.12 seconds
INFO 02-26 17:51:40 api_server.py:756] Using supplied chat template:
INFO 02-26 17:51:40 api_server.py:756] None
WARNING 02-26 17:51:40 _logger.py:72] Pin memory is not supported on CPU.
INFO 02-26 17:51:40 serving_models.py:174] Loaded new LoRA adapter: name 'lora1', path '/adapters/hub/models--ai-blond--Qwen-Qwen2.5-Coder-1.5B-Instruct-lora/snapshots/9cde18d8ed964b0519fb481cca6acd936b2ca811'
INFO 02-26 17:51:40 launcher.py:21] Available routes are:
INFO 02-26 17:51:40 launcher.py:29] Route: /openapi.json, Methods: HEAD, GET
INFO 02-26 17:51:40 launcher.py:29] Route: /docs, Methods: HEAD, GET
INFO 02-26 17:51:40 launcher.py:29] Route: /docs/oauth2-redirect, Methods: HEAD, GET
INFO 02-26 17:51:40 launcher.py:29] Route: /redoc, Methods: HEAD, GET
INFO 02-26 17:51:40 launcher.py:29] Route: /health, Methods: GET
INFO 02-26 17:51:40 launcher.py:29] Route: /ping, Methods: POST, GET
INFO 02-26 17:51:40 launcher.py:29] Route: /tokenize, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /detokenize, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /v1/models, Methods: GET
INFO 02-26 17:51:40 launcher.py:29] Route: /version, Methods: GET
INFO 02-26 17:51:40 launcher.py:29] Route: /v1/chat/completions, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /v1/completions, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /v1/embeddings, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /pooling, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /score, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /v1/score, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /rerank, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /v1/rerank, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /v2/rerank, Methods: POST
INFO 02-26 17:51:40 launcher.py:29] Route: /invocations, Methods: POST
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Exception in thread Thread-4 (_report_usage_worker):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/usage/usage_lib.py", line 162, in _report_usage_worker
    self._report_usage_once(model_architecture, usage_context, extra_kvs)
  File "/usr/local/lib/python3.10/dist-packages/vllm/usage/usage_lib.py", line 210, in _report_usage_once
    self._write_to_file(data)
  File "/usr/local/lib/python3.10/dist-packages/vllm/usage/usage_lib.py", line 239, in _write_to_file
    os.makedirs(os.path.dirname(_USAGE_STATS_JSON_PATH), exist_ok=True)
  File "/usr/lib/python3.10/os.py", line 215, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.10/os.py", line 225, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/.config'
INFO:     169.61.136.195:49127 - "GET /health HTTP/1.1" 200 OK
INFO:     169.61.136.195:49139 - "GET /health HTTP/1.1" 200 OK
INFO:     169.61.136.195:5769 - "GET /health HTTP/1.1" 200 OK
INFO:     169.61.136.195:5779 - "GET /health HTTP/1.1" 200 OK
INFO:     169.61.136.195:5787 - "GET /health HTTP/1.1" 200 OK
INFO:     169.61.136.195:5799 - "GET /health HTTP/1.1" 200 OK

nirrozenbaum · 2025-02-27T12:18:48Z

I was able to run the following:
using my Openshift cluster, I deployed a deployment and s service selecting the pods, the deployment specifies base model Qwen/Qwen2.5-1.5B-Instruct and one lora adapter.
after than I ran additional pod in the same namespace and performed curl commands to the model pods (since I wanted to eliminate potential connectivity issues at this stage of the test). I was able to get the deployment model and to run a chat completions request and get back appropriate responses.

see below the models REST call:

curl http://vllm-cpu-test.nirro.svc.cluster.local:5678/v1/models | jq
{
  "object": "list",
  "data": [
    {
      "id": "Qwen/Qwen2.5-1.5B-Instruct",
      "object": "model",
      "created": 1740657845,
      "owned_by": "vllm",
      "root": "Qwen/Qwen2.5-1.5B-Instruct",
      "parent": null,
      "max_model_len": 32768,
      "permission": [
        {
          "id": "modelperm-3b74723e5ea640759fee5814352a1be5",
          "object": "model_permission",
          "created": 1740657845,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    },
    {
      "id": "lora1",
      "object": "model",
      "created": 1740657845,
      "owned_by": "vllm",
      "root": "/adapters/hub/models--ai-blond--Qwen-Qwen2.5-Coder-1.5B-Instruct-lora/snapshots/9cde18d8ed964b0519fb481cca6acd936b2ca811",
      "parent": "Qwen/Qwen2.5-1.5B-Instruct",
      "max_model_len": null,
      "permission": [
        {
          "id": "modelperm-58319fb71f684c8c8132f2a6fc87b9ee",
          "object": "model_permission",
          "created": 1740657845,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

and the chat/completions call:

curl http://vllm-cpu-test.nirro.svc.cluster.local:5678/v1/chat/completions \
> -H "Content-Type: application/json" \
> -d '{
> "model": "lora1",
> "messages": [
>   {
>     "role": "system",
>     "content": "You are a helpful assistant."
>   },
>   {
>     "role": "user",
>     "content": "write a simple python random generator"
>   }
> ]
> }' | jq
{
  "id": "chatcmpl-952472450afa4534b65f80f4d526e34a",
  "object": "chat.completion",
  "created": 1740658028,
  "model": "lora1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "Certainly! A simple random generator in Python, often used in various applications like generating random numbers or selecting random elements from a list or dictionary, can be created using the `random` module. Below is a basic example that demonstrates how you can generate random numbers in different ways.\n\n### Simple Random Number Generation (Integer between 0-9)\n```python\nimport random\n\n# Generate a random integer between 0 and 9\nrandom_integer = random.randint(0, 9)\nprint(\"Random integer between 0 and 9:\", random_integer)\n```\n\n### Random Selection in a List\n```python\nimport random\n\n# Assuming you have a list of names or other objects\nnames = [\"Alice\", \"Bob\", \"Charlie\", \"David\", \"Eve\", \"Frank\"]\n\n# Select a random name from the list\nrandom_choice = random.choice(names)\nprint(\"Random name from the group:\", random_choice)\n```\n\n### Randomly Selecting a Key 호 Random Element's Value\n```python\nimport random\n\n# A dictionary of key-value pairs representing geographical locations and their populations\nlocations = {\n    \"Seattle\": 652919,\n    \"Tokyo\": 37935248,\n    \"Paris\": 2140562,\n    \"Osaka\": 24565411,\n    \"Shanghai\": 23005085\n}\n\n# Randomly choose a key and get its corresponding population\nrandom_location = BackgroundChoice = random.choice(list(locations.keys()))\nrandom_population =wine(locations.get(random_location))  # Assuming `get()` returns NoneType, you might want to get the `value` from the `dictionary` et way instead like wine(locations[random_location])\nprint(\"Random location and its population:\", random_location, locations[background_choice])\n```\n\n### Randomly Selecting from a Zip Code Distribution\n```python\nimport random\n\nzip_distribution = {\n    '02': 4,\n    '03': 5,\n    '04': 6,\n    '05': 4,\n    '06': 6\n}\nzip_distribution_population = [4, 5, 6, 4, 6]\n\n# Randomly select a zip code and its corresponding population\nrandom_zip = random.choice(list(zip(zip_distribution.keys()))\nrandom_merge_code_population = zip_distribution[random_zip], zip_distribution_population\n\nprint(\"Random zip code and population:\", random_zip)\nprint(\"Random population:\", random_merge_code_population)\n```\n\n### Generatingicz Random Text Strings\n```python\nimport random\n\n# Helper function (think of this like pseudo code). Rotate the character list each time. This assumes ascii values for simplicity.\ndef find_next_letter(ascii_list):\n    return ascii_list[0], ascii_list[1].lower()  # Assuming lowercase letters following the uppercase ones (A-Z)\n    # You may need further adjustments from your formatting needs.\n\n# Generate a random string of fixed length\ndef random_text(length):\n    letter_list = [chr(x) for x in range(65, 91)]  # ASCII for capitalized letters A-Z\n\n    text = ''.join(find_next_letter(letter_list) for _ in range(length))  # Adds a new character for every index\n    return text\n\nprint(\"Random 8-letter string:\", random_text(8))\n```\n\n### Notes:\n1. The `random.choice` function is used for selecting random elements from a list, and `random.randint` generates a whole number in a specific range.\n2. For distributed data generation like zip codes, you'll need a realistic distribution which might require a more complex algorithm.\n3. Forszsz real encryption or cryptographic applications, consider using established libraries like cryptography or pbkdf2 instead of regular randomization forโมзад لساعي دزرو لأش تركيا نده.\n\nThese examples give a good starting point, and you can extend the simplicity as per your specific use case.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "total_tokens": 843,
    "completion_tokens": 818,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

one point to notice is that the completions request took 2:23 minutes.

the next step would be to expose the pods via gateway and HttpRoute and then do the same calls from outside the cluster.

nirrozenbaum · 2025-02-27T12:21:30Z

the next question I have is -
I assume we would like to add this cpu based vllm setup to a CI job that runs for each PR in a gating test (this is the context of this thread).
do we have a cluster that can be used for this purpose? what would be the setup?

danehans · 2025-03-04T05:01:59Z

This gist should help guide you through the process:

https://gist.github.com/danehans/f19f8805155571b65c68cfe057b6e09c

If pulling the Qwen model requires a HF token, we'll need to reopen this:

kubernetes/k8s.io#7698.

@robscott may have additional guidance to share.

nirrozenbaum · 2025-03-04T17:49:23Z

/assign

nirrozenbaum · 2025-03-09T10:42:44Z

the cpu based example seems to work without the HF token.
I've pushed PR #464 to reflect that.

nirrozenbaum · 2025-03-13T12:38:08Z

opened issue kubernetes/test-infra#34495, but this should wait until #485 is merged.

danehans · 2025-05-01T17:21:44Z

@nirrozenbaum I see kubernetes/test-infra#34495 has been closed without a comment and #485 has merged. What are the next steps to resolve this issue? cc: @ahg-g @kfswain

nirrozenbaum · 2025-05-01T17:28:51Z

yes. we have a CPU based example we can use in e2e tests. the main issue was that it requires a lot of memory and cpu.
I've been discussing with Benjamin Elder (SIG Testing TL) around mid March and understood from him we cannot get from SIG-testing the required resources for running the cpu example.
I think the next steps are:

have issue Implement Lightweight Scheduler Simulation Tests for Inference Gateway #709 implemented. IBM are going to open source in mid May vllm-simulator which might be useful.
if there is a different alternative that's also fine. but the above vllm-sim is being tested internally in IBM nowadays to make sure it complies with GIE.
once this is available we can run a CI job with vllm simulator that doesn't require much resources.

danehans added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jan 30, 2025

danehans mentioned this issue Feb 6, 2025

Reduced GPU requirements #272

Closed

danehans mentioned this issue Mar 4, 2025

use cpu based Qwen model in e2e tests #435

Closed

k8s-ci-robot assigned nirrozenbaum Mar 4, 2025

nirrozenbaum mentioned this issue Mar 5, 2025

REQUEST: New membership for nirrozenbaum kubernetes/org#5439

Closed

11 tasks

danehans mentioned this issue May 30, 2025

Add vLLM Simulator Support #897

Open

e2e CI Job #259

e2e CI Job #259

Comments

danehans commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

danehans commented Jan 30, 2025

Uh oh!

ahg-g commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Jeffwan commented Jan 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joerunde commented Jan 30, 2025

Uh oh!

danehans commented Jan 30, 2025

Uh oh!

Jeffwan commented Feb 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danehans commented Feb 6, 2025

Uh oh!

Jeffwan commented Feb 7, 2025

Uh oh!

nirrozenbaum commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Feb 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nirrozenbaum commented Feb 27, 2025

Uh oh!

nirrozenbaum commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danehans commented Mar 4, 2025

Uh oh!

nirrozenbaum commented Mar 4, 2025

Uh oh!

nirrozenbaum commented Mar 9, 2025

Uh oh!

nirrozenbaum commented Mar 13, 2025

Uh oh!

danehans commented May 1, 2025

Uh oh!

nirrozenbaum commented May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danehans commented Jan 30, 2025 •

edited

Loading

ahg-g commented Jan 30, 2025 •

edited

Loading

Jeffwan commented Jan 30, 2025 •

edited

Loading

Jeffwan commented Feb 5, 2025 •

edited

Loading

nirrozenbaum commented Feb 26, 2025 •

edited

Loading

nirrozenbaum commented Feb 26, 2025 •

edited

Loading

nirrozenbaum commented Feb 27, 2025 •

edited

Loading

nirrozenbaum commented May 1, 2025 •

edited

Loading