-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
[Bug][V0][Trition MLA][GGUF]: Deepseek R1 GGUF starts producing gibberish towards the end of a longer generation #15340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Example output: Output
Passport 的服务提供器会自行完成服务注册,不过您需要在 接下来,您需要运行 php artisan migrate 然后,运行 php artisan passport:install 该命令执行后,请将 <?php
namespace App;
use Laravel\Passport\HasApiTokens;
use Illuminate\Notifications\Notifiable;
use Illuminate\Contracts\Auth\MustVerifyEmail;
use Illuminate\Foundation\Auth\User as Authenticatable;
class User extends Authenticatable
{
use HasApiTokens, Notifiable;
} 接下来,您应该在 <?php
namespace App\Providers;
use Laravel\Passport\Passport;
use Illuminate\Support\Facades\Gate;
use Illuminate\Foundation\Support\Providers\AuthServiceProvider as ServiceProvider;
class AuthServiceProvider extends ServiceProvider
{
/**
* The policy mappings for the application.
*
* @var array
*/
protected $policies = [
'App\Model' => 'App\Policies\ModelPolicy',
];
/**
* Register any authentication / authorization services.
*
* @return void
*/
public function boot()
{
$this->registerPolicies();
Passport::routes();
}
} 最后,在 $app->register(App\Providers\AuthServiceProvider::class); 配置在 /**
* Register any authentication / authorization services.
*
* @return void
*/
public function boot()
{
$this->registerPolicies();
Passport::routes();
Passport::tokensExpireIn(now()->addDays(15));
Passport::refreshTokensExpireIn(now()->addDays(30));
}
发放访问令牌使用 OAuth2 授权码方式发放令牌时,您的应用需要与 Passport 的令牌发放接口进行交互,以发放访问令牌给其它客户端。这种方式下,编写整个认证流程会非常麻烦,不过别担心,接下来的内容会帮助您一步步的完成整个过程。 首先,让客户端应用通过用户的浏览器向您的应用发起请求,请求参数包括 其中, 客户端 ID 与密钥如果您还没有创建任何客户端,可以先通过 php artisan passport:client --redirect_uri=http://localhost
Client ID: 3
Client secret: KSPXwy5n1MZmxvIln6k6ubunh3X0aw5asdfkDSF 重定向请求接下来,客户端应用需要将用户的浏览器重定向到您的应用上的 Route::get('/redirect', function (Request $request) {
$request->session()->put('state', $state = Str::random(40));
$query = http_build_query([
'client_id' => 'client-id',
'redirect_uri' => 'http://example.com/callback',
'response_type' => 'code',
'scope' => '',
'state' => $state,
]);
return redirect('http://your-app.com/oauth/authorize?'.$query);
}); 需要注意的是, 如果用户拒绝授权,则会重定向到 转换授权码为访问令牌如果用户授权成功,客户端应用会收到一个授权码,接下来,客户端应用需要将授权码转换为访问令牌。此时,客户端应用需要向您的应用发送 POST 请求到 $response = Http::post('http://your-app.com/oauth/token', [
'grant_type' => 'authorization_code',
'client_id' => 'client-id',
'client_secret' => 'client-secret',
'code' => $request->code,
'redirect_uri' => $request->redirect_uri,
]); 该请求的响应会包含 刷新令牌访问令牌的有效期通常比较短,当令牌过期后,您需要使用刷新令牌来获取新的访问令牌。客户端应用需要向您的应用发送 POST 请求到 $response = Http::post('http://your-app.com/oauth/token', [
'grant_type' => 'refresh_token',
'refresh_token' => 'the-refresh-token',
'client_id' => 'client-id',
'client_secret' => 'client-secret',
'scope' => '',
]); 响应会包含新的 密码授权令牌OAuth2 密码授权方式允许您通过用户名和密码直接获取访问令牌。这种方式适用于您信任的客户端,例如您的移动应用。 要使用密码授权方式,首先需要创建一个密码授权的客户端: php artisan passport:client --password 该命令会提示您输入客户端的名称,然后返回客户端的 ID 和密钥。 接下来,客户端应用需要向您的应用发送 POST 请求到 $response = Http::post('http://your-app.com/oauth/token', [
'grant_type' => 'password',
'client_id' => 'client-id',
'client_secret' => 'client-secret',
'username' => '[email protected]',
'password' => 'my-password',
'scope' => '',
]); 该请求的响应会包含 客户端凭证授权令牌客户端凭证授权适用于机器到机器的认证。例如,您有一个 API 需要从另一个服务访问,而不是由用户访问。 要使用客户端凭证授权方式,首先需要创建一个客户端: php artisan passport:client --client 然后,客户端应用需要向您的应用发送 POST 请求到 $response = Http::post('http://your-app.com/oauth/token', [
'grant_type' => 'client_credentials',
'client_id' => 'client-id',
'client_secret' => 'client-secret',
'scope' => 'your-scope',
]); 响应会包含 隐式授权令牌隐式授权类似于授权码授权,但是它直接在浏览器中发放访问令牌,而不是通过中间授权码。这种方式通常用于单页应用或者纯前端应用。 要使用隐式授权,首先需要创建一个客户端: php artisan passport:client --public 然后,客户端应用需要将用户的浏览器重定向到您的应用上的 Route::get('/redirect', function () {
$query = http_build_query([
'client_id' => 'client-id',
'redirect_uri' => 'http://example.com/callback',
'response_type' => 'token',
'scope' => '',
'state' => Str::random(40),
]);
return redirect('http://your-app.com/oauth/authorize?'.$query);
}); 如果用户授权成功,则会重定向到 个人访问令牌有时候,用户可能希望为自己颁发一个访问令牌,而不需要经过完整的 OAuth2 流程。Passport 提供了此功能,允许用户通过您的 web 界面颁发个人访问令牌。 首先,您需要创建一个客户端用于颁发个人访问令牌: php artisan passport:client --personal 该命令会提示您输入客户端的名称,然后返回客户端的 ID 和密钥。 接下来,您需要创建一个路由,允许用户查看他们的令牌,并创建新令牌: Route::get('/settings', function () {
return view('settings', [
'tokens' => Auth::user()->tokens,
'clients' => Auth::user()->clients,
]);
})->middleware('auth');
Route::post('/settings/token', function (Request $request) {
$request->validate([
'name' => 'required|max:255',
]);
$token = Auth::user()->createToken($request->name);
return view('token', ['token' => $token->accessToken]);
})->middleware('auth'); 在视图中,您可以显示用户现有的令牌,并提供一个表单让用户创建新令牌: <form method="POST" action="/settings/token">
@csrf
<input type="text" name="name" placeholder="Token name">
<button type="submit">Create Token</button>
</form> 用户提交表单后,会创建一个新的个人访问令牌,并显示给用户。 保护路由现在,用户已经可以获取访问令牌了,接下来需要保护您的 API 路由,只允许持有有效访问令牌的用户访问。 首先,在 $app->routeMiddleware([
'auth' => App\Http\Middleware\Authenticate::class,
]); 然后,在您的 API 路由中,使用 $router->group(['middleware' => 'auth:api'], function () use ($router) {
$router->get('/user', function () {
return auth()->user();
});
}); 现在,当用户访问受保护的路由时,他们需要在请求头中携带访问令牌: curl -H "Accept: application/json" -H "Authorization: Bearer {access-token}" http://your-app.com/user 撤销令牌用户可能需要撤销他们颁发的令牌。要撤销令牌,可以在 public function tokens()
{
return $this->hasMany(Token::class);
} 然后,您可以允许用户通过您的 web 界面撤销令牌: Route::delete('/settings/token/{token}', function ($tokenId) {
Auth::user()->tokens()->where('id', $tokenId)->delete();
return redirect('/settings');
})->middleware('auth'); 用户访问该路由时,指定的令牌将被撤销。 令牌作用域Passport 允许您为令牌定义作用域,以限制令牌的访问权限。首先,在 use Laravel\Passport\Passport;
Passport::tokensCan([
'place-orders' => 'Place orders',
'check-status' => 'Check order status',
]); 当用户授权客户端时,可以请求这些作用域: Route::get('/redirect', function () {
$query = http_build_query([
'client_id' => 'client-id',
'redirect_uri' => 'http://example.com/callback',
'response_type' => 'code',
' |
Could you share your |
Of course, I have tried a few different variations with not much luck (have tried different dtype, etc):
|
Hmm, looks fine to me, do you know what layer is the first one to produce NaNs? I'll try to reproduce this but haven't seen this after switching to bfloat16(I'm using q4) |
How would I go about checking which layer produces NaNs? If you let me know I’ll do that no problem! Thank you! Unfortunately I can’t fit Q4 in! |
@SzymonOzog have you been able to reproduce on your end? |
@davidsyoung Sorry, I've been very GPU poor this week and unable to run it locally, hopefully will find some time to test it next week |
No panic whatsoever, I can test on my side if there’s any patches you’d like me to apply? |
@davidsyoung How I tested it last time was just putting in an assertion after every call to a gguf quantized layer in https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/layers/quantization/gguf.py and seeing where are the NaNs returned, it would also be nice to check if they are running in bf16 |
I can’t say I’m good enough with coding to do this, however I believe I’ve more useful information for this bug. I believe it actually may be in the V0 Triton MLA engine! It also happens with this GPTQ quant: https://huggingface.co/OPEA/DeepSeek-R1-int4-gptq-sym-inc With tp=16, chunked prefill enabled. |
@DefTruth going to tag you as you seem to be working on MLA quite a bit, along with a recent PR for the V1 engine. Could this possibly be related? |
Uh oh!
There was an error while loading. Please reload this page.
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
When inferencing with DeepSeek R1
Q3_K_M
gguf quant it starts to produce gibberish towards the end of a longer generation.I have followed direction in #13167 (comment) in terms of the
--tokenizer and ---hf-config-path
configuration.I have tested various different images with nightly, and most recent
0.8.1
release, the issue persists.I would appreciate some direction on this, as vLLM is by far the fastest inference engine for GGUF on my 16x3090 config, but this bug (which @SzymonOzog had said he experienced a similar issue with model overflowing and producing NaNs, but that got fixed - ref here #13167 (comment)).
Unfortunately I'm at a bit of a loss to fix this myself.
Run command:
Run log
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: