Skip to content

feat: paged attention v2 #1183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 23, 2023
Merged

feat: paged attention v2 #1183

merged 2 commits into from
Oct 23, 2023

Conversation

OlivierDehaene
Copy link
Contributor

No description provided.

@dongs0104
Copy link
Contributor

Hi OlivierDehaene, I have a question about this and the flash attention kernel.
Why do you build an external kernel and why not use a pip installation?

@OlivierDehaene
Copy link
Contributor Author

@dongs0104, it's mainly to keep the final docker images small enough. Since we only use the attention/paged attention kernels, we do not want to bundle the rest as the image is already big.

For users that do not rely on the docker image, installing through pip is 100% fine.

@OlivierDehaene OlivierDehaene merged commit 12590fd into main Oct 23, 2023
@OlivierDehaene OlivierDehaene deleted the feat/paged_attention_v2 branch October 23, 2023 10:29
@RonanKMcGovern
Copy link

@OlivierDehaene does paged attention v2 work by default then? no additional flags or installs required? Thanks and thanks for your work on it.

@OlivierDehaene
Copy link
Contributor Author

Yes, as long as you use ghcr.io/huggingface/text-generation-inference:sha-12590fd or ulterior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants