Skip to content

Files

Latest commit

738f81a · Jun 11, 2023

History

History
18 lines (13 loc) · 447 Bytes

README.md

File metadata and controls

18 lines (13 loc) · 447 Bytes

Collection of toy implementations of memory efficient attention.

All numpy implementations are in attn.py, and there's a cuda implementation in attn_chunk_q_chunk_kv_cuda/attn_chunk_q_chunk_kv_kernel.cu

Building the cuda implementation
cd attn_chunk_q_chunk_kv_cuda
python setup.py install
Running the tests

No formal testing framework, just run

python test.py

If no assertions are thrown, tests pass :)