-
As mentioned in the title, there is no swap deque in class Scheduler, and no swap relative operation in schedule func. class Scheduler:
def __init__(
self,
scheduler_config: SchedulerConfig,
cache_config: CacheConfig,
lora_config: Optional[LoRAConfig],
) -> None:
...
self.requests: Dict[str, Request] = {}
# Priority queues for requests.
self.waiting: Deque[Request] = deque()
self.running: List[Request] = []
... codes in the vllm/v1/core/scheduler.py shown as above And I found the def _initialize_kv_caches(self,
cache_config: CacheConfig) -> Tuple[int, int]:
num_gpu_blocks, _ = self.model_executor.determine_num_available_blocks(
)
...
num_cpu_blocks = 0
self.model_executor.initialize(num_gpu_blocks)
return num_gpu_blocks, num_cpu_blocks Maybe someone can help me to explain this? |
Beta Was this translation helpful? Give feedback.
Answered by
comaniac
Dec 11, 2024
Replies: 1 comment 1 reply
-
Swapping in v1 is removed intentionally. Instead, we expect when recomputing the preempted requests, most prompt tokens could be bypassed due to prefix caching. |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
Ghjk94522
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Swapping in v1 is removed intentionally. Instead, we expect when recomputing the preempted requests, most prompt tokens could be bypassed due to prefix caching.