Kernels generated with use_fp16_qk_reductions=true break the LogitsTransform implementation used by prefill kernels

I want to clarify the semantics of the `LogitsTransform` function declared on :https://github.com/flashinfer-ai/flashinfer/blob/738460ff82e2230ebcc8dff50e49e1d6278e011a/include/flashinfer/attention/variants.cuh#L69

The `logits` parameter is templated and presumably can support `__half`. However, the computation of the value of `logits` on 

https://github.com/flashinfer-ai/flashinfer/blob/738460ff82e2230ebcc8dff50e49e1d6278e011a/include/flashinfer/attention/variants.cuh#L75 causes a compilation failure as the `operator *` cannot be resolved when the types are `__half` and `float`. Also, the assignment will also likely break because of the implicit `float` to `__half` conversion.

The `LogitsTransform` template is invoked inside the `prefill` kernel on line https://github.com/flashinfer-ai/flashinfer/blob/738460ff82e2230ebcc8dff50e49e1d6278e011a/include/flashinfer/attention/prefill.cuh#L690

I discovered this issue when working on fixing #806 and compiling kernels that were generated with the `use_fp16_qk_reductions=true` flag passed to `aot_build_utils.generate`.

I can apply a fix using a `constexpr` cast from fp16 to fp32 and _vice-versa_ either at call-site or inside `LogitsTransform` . But, before I do any mechanical changes wanted to clarify the intent of the implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Kernels generated with use_fp16_qk_reductions=true break the LogitsTransform implementation used by prefill kernels #936

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Kernels generated with use_fp16_qk_reductions=true break the LogitsTransform implementation used by prefill kernels #936

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions