kv cache quantization, cuda head size #13009

betweenus · 2025-04-18T10:54:52Z

betweenus
Apr 18, 2025

Hi. Please tell me if there is a way to assemble cores for cuda with a head size different from 128? For example, to be able to use quantization of the kv cache with gpu-acceleration for models like gemma-2, gemma-3 and others? People often suffer from similar problems, for example, like here #12624

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv cache quantization, cuda head size #13009

{{title}}

Replies: 0 comments

Select a reply

kv cache quantization, cuda head size #13009

betweenus Apr 18, 2025

Replies: 0 comments

betweenus
Apr 18, 2025