Skip to content

Commit c47aafa

Browse files
authored
[BugFix] Lazily import XgrammarBackend to avoid early cuda init (#15171)
Signed-off-by: Nick Hill <[email protected]>
1 parent cfbca8a commit c47aafa

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/v1/structured_output/__init__.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
from vllm.logger import init_logger
1010
from vllm.v1.structured_output.backend_types import (StructuredOutputBackend,
1111
StructuredOutputGrammar)
12-
from vllm.v1.structured_output.backend_xgrammar import XgrammarBackend
1312

1413
if TYPE_CHECKING:
1514
import numpy as np
@@ -47,6 +46,9 @@ def grammar_init(self, request: Request) -> None:
4746
if self.backend is None:
4847
backend_name = request.sampling_params.guided_decoding.backend_name
4948
if backend_name == "xgrammar":
49+
from vllm.v1.structured_output.backend_xgrammar import (
50+
XgrammarBackend)
51+
5052
self.backend = XgrammarBackend(self.vllm_config)
5153
else:
5254
raise ValueError(

0 commit comments

Comments
 (0)