-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Core][VLM] Test registration for OOT multimodal models #8717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
42 commits
Select commit
Hold shift + click to select a range
cbb9dfb
fix
ywang96 7ae3e07
update doc
ywang96 b67ed86
iterate
ywang96 a9f3d3f
typo
ywang96 1d174d5
update
ywang96 4ec5b75
add test
ywang96 0ce8165
update conftest
ywang96 84094a4
add plugin loading to model config
ywang96 0c36bb1
fix and add test
ywang96 d203593
move plugin loading
ywang96 a020de6
infer multimodality
ywang96 51c961a
update doc
ywang96 81629f8
format
ywang96 ec204df
more robust check
ywang96 adbb063
add back the TODO for woosuk
ywang96 273ce7e
update
ywang96 19c31d9
try better config
ywang96 dbd198d
Fix CUDA re-initialization error
DarkLight1337 263a4e7
Revert "Fix CUDA re-initialization error"
DarkLight1337 b8e6e8d
try llava
ywang96 85cedeb
Add debug script
DarkLight1337 8952494
format
DarkLight1337 989fb16
format
DarkLight1337 732d462
Avoid CUDA reinitialization error
DarkLight1337 bf369e5
Improve debug script
DarkLight1337 571eda9
patch
ywang96 af7e746
Merge branch 'main' into fix-oot-multi-modal
ywang96 52b600b
switch
ywang96 45fb02b
Try instead reducing model memory
DarkLight1337 7c987e9
Reorder the tests
DarkLight1337 45a6fa8
Iterate
DarkLight1337 2732bc3
Merge branch 'main' into fix-oot-multi-modal
DarkLight1337 1774fd5
Merge branch 'main' into fix-oot-multi-modal
DarkLight1337 36f33f8
Merge branch 'main' into fix-oot-multi-modal
DarkLight1337 83e86e4
Try limit `max_num_seqs`
DarkLight1337 8f9f7b5
No need to set this anymore
DarkLight1337 113d3f0
Remove the need for deferred imports
DarkLight1337 2066ff3
Try separating out `test_accuracy.py` and `test_audio.py`
DarkLight1337 3e1461e
Merge branch 'main' into fix-oot-multi-modal
DarkLight1337 e399079
Enable lazy import
DarkLight1337 cf980b4
Revert test pipeline
DarkLight1337 dada11d
Update docs
DarkLight1337 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
import importlib | ||
import traceback | ||
from typing import Callable | ||
from unittest.mock import patch | ||
|
||
|
||
def find_cuda_init(fn: Callable[[], object]) -> None: | ||
""" | ||
Helper function to debug CUDA re-initialization errors. | ||
|
||
If `fn` initializes CUDA, prints the stack trace of how this happens. | ||
""" | ||
from torch.cuda import _lazy_init | ||
|
||
stack = None | ||
|
||
def wrapper(): | ||
nonlocal stack | ||
stack = traceback.extract_stack() | ||
return _lazy_init() | ||
|
||
with patch("torch.cuda._lazy_init", wrapper): | ||
fn() | ||
|
||
if stack is not None: | ||
print("==== CUDA Initialized ====") | ||
print("".join(traceback.format_list(stack)).strip()) | ||
print("==========================") | ||
|
||
|
||
if __name__ == "__main__": | ||
find_cuda_init( | ||
lambda: importlib.import_module("vllm.model_executor.models.llava")) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
28 changes: 8 additions & 20 deletions
28
tests/plugins/vllm_add_dummy_model/vllm_add_dummy_model/__init__.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,14 @@ | ||
from typing import Optional | ||
|
||
import torch | ||
|
||
from vllm import ModelRegistry | ||
from vllm.model_executor.models.opt import OPTForCausalLM | ||
from vllm.model_executor.sampling_metadata import SamplingMetadata | ||
|
||
|
||
class MyOPTForCausalLM(OPTForCausalLM): | ||
|
||
def compute_logits( | ||
self, hidden_states: torch.Tensor, | ||
sampling_metadata: SamplingMetadata) -> Optional[torch.Tensor]: | ||
# this dummy model always predicts the first token | ||
logits = super().compute_logits(hidden_states, sampling_metadata) | ||
if logits is not None: | ||
logits.zero_() | ||
logits[:, 0] += 1.0 | ||
return logits | ||
|
||
|
||
def register(): | ||
# register our dummy model | ||
# Test directly passing the model | ||
from .my_opt import MyOPTForCausalLM | ||
|
||
if "MyOPTForCausalLM" not in ModelRegistry.get_supported_archs(): | ||
ModelRegistry.register_model("MyOPTForCausalLM", MyOPTForCausalLM) | ||
|
||
# Test passing lazy model | ||
if "MyLlava" not in ModelRegistry.get_supported_archs(): | ||
ModelRegistry.register_model("MyLlava", | ||
"vllm_add_dummy_model.my_llava:MyLlava") |
28 changes: 28 additions & 0 deletions
28
tests/plugins/vllm_add_dummy_model/vllm_add_dummy_model/my_llava.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
from typing import Optional | ||
|
||
import torch | ||
|
||
from vllm.inputs import INPUT_REGISTRY | ||
from vllm.model_executor.models.llava import (LlavaForConditionalGeneration, | ||
dummy_data_for_llava, | ||
get_max_llava_image_tokens, | ||
input_processor_for_llava) | ||
from vllm.model_executor.sampling_metadata import SamplingMetadata | ||
from vllm.multimodal import MULTIMODAL_REGISTRY | ||
|
||
|
||
@MULTIMODAL_REGISTRY.register_image_input_mapper() | ||
@MULTIMODAL_REGISTRY.register_max_image_tokens(get_max_llava_image_tokens) | ||
@INPUT_REGISTRY.register_dummy_data(dummy_data_for_llava) | ||
@INPUT_REGISTRY.register_input_processor(input_processor_for_llava) | ||
class MyLlava(LlavaForConditionalGeneration): | ||
|
||
def compute_logits( | ||
self, hidden_states: torch.Tensor, | ||
sampling_metadata: SamplingMetadata) -> Optional[torch.Tensor]: | ||
# this dummy model always predicts the first token | ||
logits = super().compute_logits(hidden_states, sampling_metadata) | ||
if logits is not None: | ||
logits.zero_() | ||
logits[:, 0] += 1.0 | ||
return logits |
19 changes: 19 additions & 0 deletions
19
tests/plugins/vllm_add_dummy_model/vllm_add_dummy_model/my_opt.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
from typing import Optional | ||
|
||
import torch | ||
|
||
from vllm.model_executor.models.opt import OPTForCausalLM | ||
from vllm.model_executor.sampling_metadata import SamplingMetadata | ||
|
||
|
||
class MyOPTForCausalLM(OPTForCausalLM): | ||
|
||
def compute_logits( | ||
self, hidden_states: torch.Tensor, | ||
sampling_metadata: SamplingMetadata) -> Optional[torch.Tensor]: | ||
# this dummy model always predicts the first token | ||
logits = super().compute_logits(hidden_states, sampling_metadata) | ||
if logits is not None: | ||
logits.zero_() | ||
logits[:, 0] += 1.0 | ||
return logits |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.