Merge branch 'main' of https://github.com/microsoft/Phi-3CookBook

kinfey · kinfey · commit a012121ed589 · 2024-06-13T07:20:54.000+08:00
diff --git a/README.md b/README.md
@@ -101,10 +101,15 @@ This cookbook includes:
 * [Labs and workshops samples Phi-3]()
     * [C# .NET Labs](./md/07.Labs/Csharp/csharplabs.md)(✅)
     * [Build your own Visual Studio Code GitHub Copilot Chat with Microsoft Phi-3 in AIPC](./md/07.Labs/VSCode/README.md)(✅)
-    
-* [ONNX runtime samples for Phi-3-vision]()
-    * [Phi-3-ONNX-Samples](https://onnxruntime.ai/docs/genai/tutorials/phi3-python.html)(✅)
-
+    * [Phi-3 ONNX Tutorial](https://onnxruntime.ai/docs/genai/tutorials/phi3-python.html)(✅)
+    * [Phi-3-vision ONNX Tutorial](https://onnxruntime.ai/docs/genai/tutorials/phi3-v.html)(✅)
+     * [Run the Phi-3 models with the ONNX Runtime generate() API](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)(✅)
+    * [Phi-3 ONNX Multi Model LLM Chat UI, This is a chat demo](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/chat_app)(✅)
+     * [C# Hello Phi-3 ONNX example Phi-3](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/csharp/HelloPhi)(✅)
+     * [C# API Phi-3 ONNX example to support Phi3-Vision](https://github.com/microsoft/onnxruntime-genai/tree/main/examples/csharp/HelloPhi3V)(✅)
+
+   
+   
 
 ## Contributing
 
diff --git a/md/01.Introduce/Hardwaresupport.md b/md/01.Introduce/Hardwaresupport.md
@@ -2,22 +2,37 @@
 
 Microsoft Phi-3 has been optimized for ONNX Runtime and supports Windows DirectML. It works well across various hardware types, including GPUs, CPUs, and even mobile devices. 
 
+### Device Hardware 
 Specifically, the supported hardware includes:
 
 - GPU SKU: RTX 4090 (DirectML)
 - GPU SKU: 1 A100 80GB (CUDA)
 - CPU SKU: Standard F64s v2 (64 vCPUs, 128 GiB memory)
 
-**Mobile SKU** 
+### Mobile SKU
 
 - Android - Samsung Galaxy S21
 - Apple iPhone 14 or higher A16/A17 Processor
 
+### Phi-3 Hardware Specification
 - Minimum Configuration Required:
 - Windows: DirectX 12-capable GPU and a minimum of 4GB of combined RAM
 
 CUDA: NVIDIA GPU with Compute Capability >= 7.02
 
 ![HardwareSupport](../../imgs/00/phi3hardware.png)
 
+### Running onnxruntime on multiple GPUs 
+Currently available Phi-3 ONNX models are only for 1 GPU. It's possible to support multi-gpu for Phi-3 model, but ORT with 2 gpu doesn't guarantee that it will give more throughput compared to 2 instance of ort. 
+
+At [Build 2024 the GenAI ONNX Team](https://youtu.be/WLW4SE8M9i8?si=EtG04UwDvcjunyfC) announced that they had enabled multi-instance instead of multi-gpu for Phi models. 
+
+At present this allows you to run one onnnxruntime or onnxruntime-genai instance with CUDA_VISIBLE_DEVICES environment variable like this.
+
+```Python
+CUDA_VISIBLE_DEVICES=0 python infer.py
+CUDA_VISIBLE_DEVICES=1 python infer.py
+```
 Feel free to explore Phi-3 further in [Azure AI Studio](https://ai.azure.com) 
+
+
diff --git a/md/01.Introduce/Phi3Family.md b/md/01.Introduce/Phi3Family.md
@@ -19,7 +19,7 @@ The Phi-3 Family includes mini, small, medium and vision versions, trained based
 
 Phi-3-mini is a 3.8B parameter language model, available in two context lengths [128K](https://aka.ms/phi3-mini-128k-azure-ai) and [4K.](https://aka.ms/phi3-mini-4k-azure-ai)  
 
-Phi-3-Mini is a Transformer-based language model with 3.8 billion parameters. It was trained using high-quality data containing educationally useful information, augmented with new data sources consisting of various NLP synthetic texts, and both internal and external chat datasets, which significantly improve chat capabilities. Additionally, Phi-3-Mini has been chat fine-tuned after pre-training through supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Following this post-training, Phi-3-Mini has demonstrated significant improvements in several capabilities, particularly in alignment, robustness, and safety. The model is part of the Phi-3 family and comes in the Mini version with two variants, 4K and 128K, which represent the context length (in tokens) that it can support.
+Phi-3-Mini is a Transformer-based language model with 3.8 billion parameters. It was trained using high-quality data containing educationally useful information, augmented with new data sources consisting of various NLP synthetic texts, and both internal and external chat datasets, which significantly improve chat capabilities. Additionally, Phi-3-Mini has been chat fine-tuned after pre-training through supervised fine-tuning (SFT) and Direct Preference Optimization (DPO). Following this post-training, Phi-3-Mini has demonstrated significant improvements in several capabilities, particularly in alignment, robustness, and safety. The model is part of the Phi-3 family and comes in the Mini version with two variants, 4K and 128K, which represent the context length (in tokens) that it can support. 
 
 ## **Phi-3-Small**
 
@@ -48,6 +48,10 @@ Phi Silica API along with OCR, Studio Effects, Live Captions, Recall User Activi
 - [Azure AI](https://aka.ms/phi3-azure-ai) 
 - [Hugging Face.](https://aka.ms/phi3-hf) 
 
+## ONNX Models 
+
+The primary difference between the two ONNX models, “cpu-int4-rtn-block-32” and “cpu-int4-rtn-block-32-acc-level-4”, is the accuracy level. The model with “acc-level-4” is designed to balance latency versus accuracy, with a minor trade-off in accuracy for better performance, which might be particularly suitable for mobile devices
+
 ## Example of Model Selection 
 
 | | | | |