Skip to content

Commit 008891b

Browse files
liangfuJF-D
authored andcommitted
[Neuron][Kernel] NKI-based flash-attention kernel with paged KV cache (vllm-project#11277)
Signed-off-by: Liangfu Chen <[email protected]> Co-authored-by: Jiangfei Duan <[email protected]>
1 parent 0ae8f3e commit 008891b

File tree

3 files changed

+1126
-1
lines changed

3 files changed

+1126
-1
lines changed

.buildkite/run-neuron-test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,4 +54,4 @@ docker run --rm -it --device=/dev/neuron0 --device=/dev/neuron1 --network host \
5454
-e "NEURON_COMPILE_CACHE_URL=${NEURON_COMPILE_CACHE_MOUNT}" \
5555
--name "${container_name}" \
5656
${image_name} \
57-
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py"
57+
/bin/bash -c "python3 /workspace/vllm/examples/offline_inference/neuron.py && python3 -m pytest /workspace/vllm/tests/neuron/ -v --capture=tee-sys"

0 commit comments

Comments
 (0)