How integrate with hf with minial modification? #242

lucasjinreal · 2023-06-21T03:09:19Z

lucasjinreal
Jun 21, 2023

I saw all are wrappers around vllm, how to integrate hf and see out-of-box boost from my existing model?

WoosukKwon · 2023-06-21T03:22:19Z

WoosukKwon
Jun 21, 2023
Maintainer

Thanks for your interest and great question! You can install vLLM from source and directly modify the model code.

0 replies

lucasjinreal · 2023-06-21T05:15:43Z

lucasjinreal
Jun 21, 2023
Author

This is a huge change, is there any easier way to do with llama? I dont want insert these code to my transformers based existing project.

0 replies

liujuncn · 2023-06-21T07:06:48Z

liujuncn
Jun 21, 2023

Thanks for your interest and great question! You can install vLLM from source and directly modify the model code.

Can you guys point out in the documentation which are the necessary modifications? Or give a tutorial on the modification steps of a model.

"Rewrite the forward methods" section in the document is too brief.

0 replies

WoosukKwon · 2023-06-21T08:07:07Z

WoosukKwon
Jun 21, 2023
Maintainer

@lucasjinreal Is your model different from the original LLaMA? If not, you can simply pass the path to your model weights in llm = LLM(model=<path to your model>) and use the llm object and its generate method in your code.

0 replies

WoosukKwon · 2023-06-21T08:14:15Z

WoosukKwon
Jun 21, 2023
Maintainer

@liujuncn Thanks for your feedback. We'll describe more details in the doc. In order to address your issue quickly, could you share with us the specific model you're interested in using with vLLM? Depending on the model architecture, we might be able to incorporate support for it promptly.

0 replies

lucasjinreal · 2023-06-21T11:22:54Z

lucasjinreal
Jun 21, 2023
Author

@WoosukKwon Can u be more specific?

Like I have hf based

model = AutoModelForCausalLM.from_pretrained(
            # base_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True
            base_model_path,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16,
            trust_remote_code=True,
            load_in_8bit=load_in_8bit,
            device_map="auto",
        )

How can I specific the from_pretrained and possibely specific the params here? Does the weights same as vllm? How can I specific the optimization method fp16 or bf16 etc?

And my generate loop was with stream, does it supported?

0 replies

liujuncn · 2023-06-22T09:55:57Z

liujuncn
Jun 22, 2023

@liujuncn Thanks for your feedback. We'll describe more details in the doc. In order to address your issue quickly, could you share with us the specific model you're interested in using with vLLM? Depending on the model architecture, we might be able to incorporate support for it promptly.

For example x-transformers here:
https://github.com/lucidrains/x-transformers

We can choose to combine different tricks. So how would a custom model architecture be possible using vLLM？

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How integrate with hf with minial modification? #242

Uh oh!

{{title}}

Uh oh!

Replies: 7 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How integrate with hf with minial modification? #242

Uh oh!

lucasjinreal Jun 21, 2023

Replies: 7 comments

Uh oh!

WoosukKwon Jun 21, 2023 Maintainer

Uh oh!

lucasjinreal Jun 21, 2023 Author

Uh oh!

liujuncn Jun 21, 2023

Uh oh!

Uh oh!

WoosukKwon Jun 21, 2023 Maintainer

Uh oh!

WoosukKwon Jun 21, 2023 Maintainer

Uh oh!

lucasjinreal Jun 21, 2023 Author

Uh oh!

liujuncn Jun 22, 2023

lucasjinreal
Jun 21, 2023

WoosukKwon
Jun 21, 2023
Maintainer

lucasjinreal
Jun 21, 2023
Author

liujuncn
Jun 21, 2023

WoosukKwon
Jun 21, 2023
Maintainer

WoosukKwon
Jun 21, 2023
Maintainer

lucasjinreal
Jun 21, 2023
Author

liujuncn
Jun 22, 2023