vllm 优化之 PagedAttention 源码解读 - Zhang #213

2025-02-12T02:12:59Z

giscus[bot]
bot Feb 12, 2025

vllm 优化之 PagedAttention 源码解读 - Zhang

从事 LLM 推理部署、视觉算法开发、模型压缩部署以及算法SDK开发工作,终身学习践行者。LLM_Infer总结了 vllm 的 pagedattention 内核设计和动态分配、管理 kv cache 内存的模块流程，难点主要有三个：一个是 block_tables 的创建和管理，以及 gpu 设备在指定模型上的可分配的内存 blocks 的计算，最后就是 pagedattention 内核代码中相关线程索引和偏移的计算怎么改成基于 block_tables 的形式，这都需要反复阅读理解代码才能得到清晰的理解。

https://www.armcvai.cn/2024-11-17/vllm-pagedattention.html

Aukarous · 2025-02-12T02:13:03Z

Aukarous
Feb 12, 2025 — with giscus

这个代码字体真好看，叫啥呀

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm 优化之 PagedAttention 源码解读 - Zhang #213

{{title}}

Replies: 1 comment

{{title}}

Select a reply

vllm 优化之 PagedAttention 源码解读 - Zhang #213

giscus[bot] bot Feb 12, 2025

vllm 优化之 PagedAttention 源码解读 - Zhang

Replies: 1 comment

Aukarous Feb 12, 2025 — with giscus

giscus[bot]
bot Feb 12, 2025

Aukarous
Feb 12, 2025 — with giscus