Skip to content

Commit a012121

Browse files
committed
2 parents 8f7be49 + b23f10c commit a012121

File tree

3 files changed

+30
-6
lines changed

3 files changed

+30
-6
lines changed

README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,15 @@ This cookbook includes:
101101
* [Labs and workshops samples Phi-3]()
102102
* [C# .NET Labs](./md/07.Labs/Csharp/csharplabs.md)(✅)
103103
* [Build your own Visual Studio Code GitHub Copilot Chat with Microsoft Phi-3 in AIPC](./md/07.Labs/VSCode/README.md)(✅)
104-
105-
* [ONNX runtime samples for Phi-3-vision]()
106-
* [Phi-3-ONNX-Samples](https://onnxruntime.ai/docs/genai/tutorials/phi3-python.html)(✅)
107-
104+
* [Phi-3 ONNX Tutorial](https://onnxruntime.ai/docs/genai/tutorials/phi3-python.html)(✅)
105+
* [Phi-3-vision ONNX Tutorial](https://onnxruntime.ai/docs/genai/tutorials/phi3-v.html)(✅)
106+
* [Run the Phi-3 models with the ONNX Runtime generate() API](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)(✅)
107+
* [Phi-3 ONNX Multi Model LLM Chat UI, This is a chat demo](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/chat_app)(✅)
108+
* [C# Hello Phi-3 ONNX example Phi-3](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/csharp/HelloPhi)(✅)
109+
* [C# API Phi-3 ONNX example to support Phi3-Vision](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/csharp/HelloPhi3V)(✅)
110+
111+
112+
108113

109114
## Contributing
110115

md/01.Introduce/Hardwaresupport.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,22 +2,37 @@
22

33
Microsoft Phi-3 has been optimized for ONNX Runtime and supports Windows DirectML. It works well across various hardware types, including GPUs, CPUs, and even mobile devices.
44

5+
### Device Hardware
56
Specifically, the supported hardware includes:
67

78
- GPU SKU: RTX 4090 (DirectML)
89
- GPU SKU: 1 A100 80GB (CUDA)
910
- CPU SKU: Standard F64s v2 (64 vCPUs, 128 GiB memory)
1011

11-
**Mobile SKU**
12+
### Mobile SKU
1213

1314
- Android - Samsung Galaxy S21
1415
- Apple iPhone 14 or higher A16/A17 Processor
1516

17+
### Phi-3 Hardware Specification
1618
- Minimum Configuration Required:
1719
- Windows: DirectX 12-capable GPU and a minimum of 4GB of combined RAM
1820

1921
CUDA: NVIDIA GPU with Compute Capability >= 7.02
2022

2123
![HardwareSupport](../../imgs/00/phi3hardware.png)
2224

25+
### Running onnxruntime on multiple GPUs
26+
Currently available Phi-3 ONNX models are only for 1 GPU. It's possible to support multi-gpu for Phi-3 model, but ORT with 2 gpu doesn't guarantee that it will give more throughput compared to 2 instance of ort.
27+
28+
At [Build 2024 the GenAI ONNX Team](https://youtu.be/WLW4SE8M9i8?si=EtG04UwDvcjunyfC) announced that they had enabled multi-instance instead of multi-gpu for Phi models.
29+
30+
At present this allows you to run one onnnxruntime or onnxruntime-genai instance with CUDA_VISIBLE_DEVICES environment variable like this.
31+
32+
```Python
33+
CUDA_VISIBLE_DEVICES=0 python infer.py
34+
CUDA_VISIBLE_DEVICES=1 python infer.py
35+
```
2336
Feel free to explore Phi-3 further in [Azure AI Studio](https://ai.azure.com)
37+
38+

md/01.Introduce/Phi3Family.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ The Phi-3 Family includes mini, small, medium and vision versions, trained based
1919

2020
Phi-3-mini is a 3.8B parameter language model, available in two context lengths [128K](https://aka.ms/phi3-mini-128k-azure-ai) and [4K.](https://aka.ms/phi3-mini-4k-azure-ai)
2121

22-
Phi-3-Mini is a Transformer-based language model with 3.8 billion parameters. It was trained using high-quality data containing educationally useful information, augmented with new data sources consisting of various NLP synthetic texts, and both internal and external chat datasets, which significantly improve chat capabilities. Additionally, Phi-3-Mini has been chat fine-tuned after pre-training through supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Following this post-training, Phi-3-Mini has demonstrated significant improvements in several capabilities, particularly in alignment, robustness, and safety. The model is part of the Phi-3 family and comes in the Mini version with two variants, 4K and 128K, which represent the context length (in tokens) that it can support.
22+
Phi-3-Mini is a Transformer-based language model with 3.8 billion parameters. It was trained using high-quality data containing educationally useful information, augmented with new data sources consisting of various NLP synthetic texts, and both internal and external chat datasets, which significantly improve chat capabilities. Additionally, Phi-3-Mini has been chat fine-tuned after pre-training through supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Following this post-training, Phi-3-Mini has demonstrated significant improvements in several capabilities, particularly in alignment, robustness, and safety. The model is part of the Phi-3 family and comes in the Mini version with two variants, 4K and 128K, which represent the context length (in tokens) that it can support.
2323

2424
## **Phi-3-Small**
2525

@@ -48,6 +48,10 @@ Phi Silica API along with OCR, Studio Effects, Live Captions, Recall User Activi
4848
- [Azure AI](https://aka.ms/phi3-azure-ai)
4949
- [Hugging Face.](https://aka.ms/phi3-hf)
5050

51+
## ONNX Models
52+
53+
The primary difference between the two ONNX models, “cpu-int4-rtn-block-32” and “cpu-int4-rtn-block-32-acc-level-4”, is the accuracy level. The model with “acc-level-4” is designed to balance latency versus accuracy, with a minor trade-off in accuracy for better performance, which might be particularly suitable for mobile devices
54+
5155
## Example of Model Selection
5256

5357
| | | | |

0 commit comments

Comments
 (0)