You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From the title, sounds a lot like #3565, which the gain is dependent on the efficient GPU kernels (especially quantized models) to pull off the speed boost, the # FLOPS would also make a big difference. Since meta has done some work on this, check if they already support quantized models. Most researchers test their optimizations with H100, A100 at fp16.
Prerequisites
Feature Description
https://x.com/AIatMeta/status/1851327605716435011?t=uCwZiiCcZqPQz0O9NjLfoQ&s=19
Motivation
Meta releases Layer Skip, an end-to-end solution for accelerating LLMs
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: