Fix PiecewiseCompileInterpreter

zou3519 · zou3519 · commit a23052bfe5b1 · 2025-05-27T08:57:08.000-07:00
This PR fixes the other issue discovered in vllm-project#16859 when upgrading from PyTorch 2.6 to PyTorch 2.7. I don't know why the code used to work in PyTorch 2.6, but the explanation is: - when we are running PiecewiseCompileInterpreter, we end up doing FakeTensor propagation - FakeTensor propagation requires `enable_python_dispatcher` to work. The mechanism is that some of our "C++ implementations" for operations, like matmul, force specialization of dynamic shapes. torch.compile works around this by replacing PyTorch's "C++ implementation" for matmul with a python-based implementation for matmul that does not force specialization. Test Plan: - Ran `pytest -v tests/models/test_transformers.py -k test_models[meta-llama/Llama-3.2-1B-Instruct-transformers]` with PyTorch >= 2.7 and vllm-project#17330, verified that the test passes. Signed-off-by: rzou <zou3519@gmail.com>
diff --git a/vllm/compilation/backends.py b/vllm/compilation/backends.py
@@ -10,6 +10,7 @@
 
 import torch
 import torch.fx as fx
+from torch._dispatch.python import enable_python_dispatcher
 
 import vllm.envs as envs
 from vllm.config import CompilationConfig, VllmConfig
@@ -269,7 +270,7 @@ def run(self, *args):
             self.fake_mode.from_tensor(t) if isinstance(t, torch.Tensor) else t
             for t in args
         ]
-        with self.fake_mode:
+        with self.fake_mode, enable_python_dispatcher():
             return super().run(*fake_args)
 
     def call_module(self, target: torch.fx.node.Target,