You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for the great introduction, this is super helpful! quick question about the curve of sampling latency scaling with batch size, wondering why is there a bump of latency from around 130 to 140 for PyTorch.
Hi @platypus1989 it might because of a change of kernel choice (e.g. grid size bump from 128 to 256) for different batch sizes. Maybe @xslingcn can provide the raw trace file and check the kernel configuration under different batch size.
Sorting-Free GPU Kernels for LLM Sampling | FlashInfer
Background
https://flashinfer.ai/2025/03/10/sampling.html
The text was updated successfully, but these errors were encountered: