Skip to content

Commit 1902354

Browse files
Zhreyusrinjoydutta03ariG23498
authoredOct 21, 2024
Issue #44 Add Gradio Demos for Llama Models - Initial Contribution (#67)
* Adding Gradio Demos * Updated README, Added descriptions, changed chatbot interface * Update gradio_demos/README.md Co-authored-by: Aritra Roy Gosthipaty <[email protected]> * Update gradio_demos/README.md Co-authored-by: Aritra Roy Gosthipaty <[email protected]> * Removed other demos, updated system_prompts for specific tasks, updated README * Updated README * Updated README, chatbot_demo * Update README.md * Simplify Chatbot Demo notebook and remove gradio_demos/README.md * Added screenshot, refactored code and cell descriptions * Refactor chat history handling for direct Gradio compatibility * Synced README.md with upstream * Synced README.md with upstream * Remove screenshot from README * Adding Gradio demo as a Python Script * Add demo screenshot to assets * Update path to screenshot in notebook --------- Co-authored-by: Srinjoy <[email protected]> Co-authored-by: Aritra Roy Gosthipaty <[email protected]> Co-authored-by: Srinjoy Dutta <[email protected]>
1 parent 5994901 commit 1902354

File tree

5 files changed

+1412
-1
lines changed

5 files changed

+1412
-1
lines changed
 

‎.github/README.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,9 @@ Seeking an entry-level RAG pipeline? This notebook guides you through building a
157157
## Text Generation Inference (TGI) & API Inference with Llama Models
158158
Text Generation Inference (TGI) framework enables efficient and scalable deployment of Llama models. In this notebook we'll learn how to integrate TGI for fast text generation and to consume already deployed Llama models via Inference API:
159159

160-
* [Text Generation Inference (TGI) with Llama Models](../llama_tgi_api_inference/tgi_api_inference_recipe.ipynb)
160+
* [Text Generation Inference (TGI) with Llama Models](../llama_tgi_api_inference/tgi_api_inference_recipe.ipynb)
161+
162+
## Chatbot Demo with Llama Models
163+
Would you like to build a chatbot with Llama models? Here's a simple example to get you started.
164+
165+
* [Chatbot with Llama Models](../gradio_demos/chatbot_demo.ipynb)

‎.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
.DS_Store
2+
.venv

‎assets/gradio_chatbot_demo.png

324 KB
Loading

‎gradio_demos/chatbot_demo.ipynb

+1,316
Large diffs are not rendered by default.

‎gradio_demos/chatbot_demo.py

+89
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
from huggingface_hub import login
2+
import gradio as gr
3+
import torch
4+
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
5+
6+
# Log in to Hugging Face Hub
7+
login()
8+
9+
# Determine the device to use (GPU if available, otherwise CPU)
10+
device = 0 if torch.cuda.is_available() else -1
11+
12+
# Dictionary mapping model names to their Hugging Face Hub identifiers
13+
llama_models = {
14+
"Llama 3 70B Instruct": "meta-llama/Meta-Llama-3-70B-Instruct",
15+
"Llama 3 8B Instruct": "meta-llama/Meta-Llama-3-8B-Instruct",
16+
"Llama 3.1 70B Instruct": "meta-llama/Llama-3.1-70B-Instruct",
17+
"Llama 3.1 8B Instruct": "meta-llama/Llama-3.1-8B-Instruct",
18+
"Llama 3.2 3B Instruct": "meta-llama/Llama-3.2-3B-Instruct",
19+
"Llama 3.2 1B Instruct": "meta-llama/Llama-3.2-1B-Instruct",
20+
}
21+
22+
# Function to load the model and tokenizer
23+
def load_model(model_name):
24+
tokenizer = AutoTokenizer.from_pretrained(model_name)
25+
model = AutoModelForCausalLM.from_pretrained(model_name)
26+
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, device=device)
27+
return generator
28+
29+
# Cache to store loaded models
30+
model_cache = {}
31+
32+
# Function to generate chat responses
33+
def generate_chat(user_input, history, model_choice):
34+
# Load the model if not already cached
35+
if model_choice not in model_cache:
36+
model_cache[model_choice] = load_model(llama_models[model_choice])
37+
generator = model_cache[model_choice]
38+
39+
# Initial system prompt
40+
system_prompt = {"role": "system", "content": "You are a helpful assistant"}
41+
42+
# Initialize history if it's None
43+
if history is None:
44+
history = [system_prompt]
45+
46+
# Append user input to history
47+
history.append({"role": "user", "content": user_input})
48+
49+
# Generate response using the model
50+
response = generator(
51+
history,
52+
max_length=512,
53+
pad_token_id=generator.tokenizer.eos_token_id,
54+
do_sample=True,
55+
temperature=0.7,
56+
top_p=0.9
57+
)[-1]["generated_text"][-1]["content"]
58+
59+
# Append model response to history
60+
history.append({"role": "assistant", "content": response})
61+
62+
return history
63+
64+
# Create Gradio interface
65+
with gr.Blocks() as demo:
66+
gr.Markdown("<h1><center>Chat with Llama Models</center></h1>")
67+
68+
# Dropdown to select model
69+
model_choice = gr.Dropdown(list(llama_models.keys()), label="Select Llama Model")
70+
# Chatbot interface
71+
chatbot = gr.Chatbot(label="Chatbot Interface", type="messages")
72+
# Textbox for user input
73+
txt_input = gr.Textbox(show_label=False, placeholder="Type your message here...")
74+
75+
# Function to handle user input and generate response
76+
def respond(user_input, chat_history, model_choice):
77+
if model_choice is None:
78+
model_choice = list(llama_models.keys())[0]
79+
updated_history = generate_chat(user_input, chat_history, model_choice)
80+
return "", updated_history
81+
82+
# Submit user input on pressing Enter
83+
txt_input.submit(respond, [txt_input, chatbot, model_choice], [txt_input, chatbot])
84+
# Button to submit user input
85+
submit_btn = gr.Button("Submit")
86+
submit_btn.click(respond, [txt_input, chatbot, model_choice], [txt_input, chatbot])
87+
88+
# Launch the Gradio demo
89+
demo.launch()

0 commit comments

Comments
 (0)
Please sign in to comment.