feat: paged attention v2 #1183

OlivierDehaene · 2023-10-20T09:53:40Z

No description provided.

dongs0104 · 2023-10-22T23:37:11Z

Hi OlivierDehaene, I have a question about this and the flash attention kernel.
Why do you build an external kernel and why not use a pip installation?

OlivierDehaene · 2023-10-23T08:29:07Z

@dongs0104, it's mainly to keep the final docker images small enough. Since we only use the attention/paged attention kernels, we do not want to bundle the rest as the image is already big.

For users that do not rely on the docker image, installing through pip is 100% fine.

RonanKMcGovern · 2023-10-27T20:41:40Z

@OlivierDehaene does paged attention v2 work by default then? no additional flags or installs required? Thanks and thanks for your work on it.

OlivierDehaene · 2023-10-27T21:23:28Z

Yes, as long as you use ghcr.io/huggingface/text-generation-inference:sha-12590fd or ulterior.

OlivierDehaene added 2 commits October 20, 2023 10:27

feat: paged attention v2

61ad935

update flash attn

8d5e977

OlivierDehaene merged commit 12590fd into main Oct 23, 2023

OlivierDehaene deleted the feat/paged_attention_v2 branch October 23, 2023 10:29

OlivierDehaene mentioned this pull request Oct 23, 2023

Add Flash Decoding #1151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: paged attention v2 #1183

feat: paged attention v2 #1183

Uh oh!

OlivierDehaene commented Oct 20, 2023

Uh oh!

dongs0104 commented Oct 22, 2023

Uh oh!

OlivierDehaene commented Oct 23, 2023

Uh oh!

RonanKMcGovern commented Oct 27, 2023

Uh oh!

OlivierDehaene commented Oct 27, 2023

Uh oh!

Uh oh!

feat: paged attention v2 #1183

feat: paged attention v2 #1183

Uh oh!

Conversation

OlivierDehaene commented Oct 20, 2023

Uh oh!

dongs0104 commented Oct 22, 2023

Uh oh!

OlivierDehaene commented Oct 23, 2023

Uh oh!

RonanKMcGovern commented Oct 27, 2023

Uh oh!

OlivierDehaene commented Oct 27, 2023

Uh oh!

Uh oh!