|
1 |
| -# Building ExecuTorch LLaMA Android Demo App |
2 |
| - |
3 |
| -This app demonstrates the use of the LLaMA chat app demonstrating local inference use case with ExecuTorch. |
4 |
| - |
5 |
| -## Prerequisites |
6 |
| -* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment. |
7 |
| -* Install [Java 17 JDK](https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html). |
8 |
| -* Install the [Android SDK API Level 34](https://developer.android.com/about/versions/14/setup-sdk) and |
9 |
| - [Android NDK 25.0.8775105](https://developer.android.com/studio/projects/install-ndk). |
10 |
| - * If you have Android Studio set up, you can install them with |
11 |
| - * Android Studio Settings -> Language & Frameworks -> Android SDK -> SDK Platforms -> Check the row with API Level 34. |
12 |
| - * Android Studio Settings -> Language & Frameworks -> Android SDK -> SDK Tools -> Check NDK (Side by side) row. |
13 |
| - * Alternatively, you can follow [this guide](https://github.com/pytorch/executorch/blob/856e085b9344c8b0bf220a97976140a5b76356aa/examples/demo-apps/android/LlamaDemo/SDK.md) to set up Java/SDK/NDK with CLI. |
14 |
| -* Supported Host OS: CentOS, macOS Sonoma on Apple Silicon. |
15 |
| - |
16 |
| -Note: This demo app and tutorial has only been validated with arm64-v8a [ABI](https://developer.android.com/ndk/guides/abis), with NDK 25.0.8775105. |
17 |
| - |
18 |
| -## Getting models |
19 |
| -Please refer to the [ExecuTorch Llama2 docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md) to export the model. |
20 |
| - |
21 |
| -After you export the model and generate tokenizer.bin, push them device: |
22 |
| -```bash |
23 |
| -adb shell mkdir -p /data/local/tmp/llama |
24 |
| -adb push llama2.pte /data/local/tmp/llama |
25 |
| -adb push tokenizer.bin /data/local/tmp/llama |
26 |
| -``` |
| 1 | +# ExecuTorch Llama Android Demo App |
27 | 2 |
|
28 |
| -Note: The demo app searches in `/data/local/tmp/llama` for .pte and .bin files as LLAMA model and tokenizer. |
| 3 | +We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer. |
29 | 4 |
|
30 |
| -## Build library |
31 |
| -For the demo app to build, we need to build the ExecuTorch AAR library first. |
| 5 | +This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case. |
32 | 6 |
|
33 |
| -The AAR library contains the required Java package and the corresponding JNI |
34 |
| -library for using ExecuTorch in your Android app. |
| 7 | +Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas. |
35 | 8 |
|
36 |
| -### Alternative 1: Use prebuilt AAR library (recommended) |
37 | 9 |
|
38 |
| -1. Open a terminal window and navigate to the root directory of the `executorch`. |
39 |
| -2. Run the following command to download the prebuilt library: |
40 |
| -```bash |
41 |
| -bash examples/demo-apps/android/LlamaDemo/download_prebuilt_lib.sh |
42 |
| -``` |
| 10 | +## Key Concepts |
| 11 | +From this demo app, you will learn many key concepts such as: |
| 12 | +* How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates |
| 13 | +* Expose the ExecuTorch library via JNI layer |
| 14 | +* Familiarity with current ExecuTorch app-facing capabilities |
43 | 15 |
|
44 |
| -The prebuilt AAR library contains the Java library and the JNI binding for |
45 |
| -NativePeer.java and ExecuTorch native library, including core ExecuTorch |
46 |
| -runtime libraries, XNNPACK backend, Portable kernels, Optimized kernels, |
47 |
| -and Quantized kernels. It comes with two ABI variants, arm64-v8a and x86_64. |
| 16 | +The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases. |
48 | 17 |
|
49 |
| -If you want to use the prebuilt library for your own app, please refer to |
50 |
| -[Using Android prebuilt libraries (AAR)](./android-prebuilt-library.md) for |
51 |
| -tutorial. |
| 18 | +## Supporting Models |
| 19 | +As a whole, the models that this app supports are (varies by delegate): |
| 20 | +* Llama 3.1 8B |
| 21 | +* Llama 3 8B |
| 22 | +* Llama 2 7B |
| 23 | +* LLaVA-1.5 vision model (only XNNPACK) |
52 | 24 |
|
53 |
| -If you need to use other dependencies (like tokenizer), please refer to |
54 |
| -Alternative 2: Build from local machine option. |
55 | 25 |
|
56 |
| -### Alternative 2: Build from local machine |
57 |
| -1. Open a terminal window and navigate to the root directory of the `executorch`. |
58 |
| -2. Set the following environment variables: |
59 |
| -```bash |
60 |
| -export ANDROID_NDK=<path_to_android_ndk> |
61 |
| -export ANDROID_ABI=arm64-v8a |
62 |
| -``` |
63 |
| -Note: `<path_to_android_ndk>` is the root for the NDK, which is usually under |
64 |
| -`~/Library/Android/sdk/ndk/XX.Y.ZZZZZ` for macOS, and contains NOTICE and README.md. |
65 |
| -We use `<path_to_android_ndk>/build/cmake/android.toolchain.cmake` for CMake to cross-compile. |
66 |
| - |
67 |
| -3. Build the Android Java extension code: |
68 |
| -```bash |
69 |
| -pushd extension/android |
70 |
| -./gradlew build |
71 |
| -popd |
72 |
| -``` |
| 26 | +## Building the APK |
| 27 | +First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device: |
73 | 28 |
|
74 |
| -4. Run the following command set up the required JNI library: |
75 |
| -```bash |
76 |
| -pushd examples/demo-apps/android/LlamaDemo |
77 |
| -./gradlew :app:setup |
78 |
| -popd |
79 |
| -``` |
80 |
| -This is running the shell script [setup.sh](./setup.sh) which configures the required core ExecuTorch, LLAMA2, and Android libraries, builds them, and copy to jniLibs. |
| 29 | +| Delegate | Resource | |
| 30 | +| ------------- | ------------- | |
| 31 | +| XNNPACK (CPU-based library) | [link](docs/delegates/xnnpack_README.md) | |
| 32 | +| QNN (Qualcomm AI Accelerators) | Coming soon | |
| 33 | +| MediaTek (MediaTek AI Accelerators) | [link](docs/delegates/mediatek_README.md) | |
| 34 | + |
| 35 | +## How to Use the App |
81 | 36 |
|
82 |
| -## Build APK |
83 |
| -### Alternative 1: Android Studio (Recommended) |
| 37 | +This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API. |
| 38 | + |
| 39 | +For loading the app, development, and running on device we recommend Android Studio: |
84 | 40 | 1. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo.
|
85 | 41 | 2. Run the app (^R). This builds and launches the app on the phone.
|
86 | 42 |
|
87 |
| -### Alternative 2: Command line |
88 |
| -Without Android Studio UI, we can run gradle directly to build the app. We need to set up the Android SDK path and invoke gradle. |
89 |
| -```bash |
90 |
| -export ANDROID_HOME=<path_to_android_sdk_home> |
91 |
| -pushd examples/demo-apps/android/LlamaDemo |
92 |
| -./gradlew :app:installDebug |
93 |
| -popd |
| 43 | +### Opening the App |
| 44 | + |
| 45 | +Below are the UI features for the app. |
| 46 | + |
| 47 | +Select the settings widget to get started with picking a model, its parameters and any prompts. |
| 48 | +<p align="center"> |
| 49 | +<img src="docs/screenshots/opening_the_app_details.png" width=800> |
| 50 | +</p> |
| 51 | + |
| 52 | + |
| 53 | + |
| 54 | +### Select Models and Parameters |
| 55 | + |
| 56 | +Once you've selected the model, tokenizer, and model type you are ready to click on "Load Model" to have the app load the model and go back to the main Chat activity. |
| 57 | +<p align="center"> |
| 58 | + <img src="docs/screenshots/settings_menu.png" width=300> |
| 59 | +</p> |
| 60 | + |
| 61 | + |
| 62 | + |
| 63 | +Optional Parameters: |
| 64 | +* Temperature: Defaulted to 0, you can adjust the temperature for the model as well. The model will reload upon any adjustments. |
| 65 | +* System Prompt: Without any formatting, you can enter in a system prompt. For example, "you are a travel assistant" or "give me a response in a few sentences". |
| 66 | +* User Prompt: More for the advanced user, if you would like to manually input a prompt then you can do so by modifying the `{{user prompt}}`. You can also modify the special tokens as well. Once changed then go back to the main Chat activity to send. |
| 67 | + |
| 68 | +> [!TIP] |
| 69 | +> Helpful ExecuTorch API in app |
| 70 | +
|
| 71 | +```java |
| 72 | +// Upon returning to the Main Chat Activity |
| 73 | +mModule = new LlamaModule( |
| 74 | + ModelUtils.getModelCategory(mCurrentSettingsFields.getModelType()), |
| 75 | + modelPath, |
| 76 | + tokenizerPath, |
| 77 | + temperature); |
| 78 | +int loadResult = mModule.load(); |
| 79 | +``` |
| 80 | + |
| 81 | +* `modelCategory`: Indicate whether it’s a text-only or vision model |
| 82 | +* `modePath`: path to the .pte file |
| 83 | +* `tokenizerPath`: path to the tokenizer .bin file |
| 84 | +* `temperature`: model parameter to adjust the randomness of the model’s output |
| 85 | + |
| 86 | + |
| 87 | +### User Prompt |
| 88 | +Once model is successfully loaded then enter any prompt and click the send (i.e. generate) button to send it to the model. |
| 89 | +<p align="center"> |
| 90 | +<img src="docs/screenshots/load_complete_and_start_prompt.png" width=300> |
| 91 | +</p> |
| 92 | + |
| 93 | +You can provide it more follow-up questions as well. |
| 94 | +<p align="center"> |
| 95 | +<img src="docs/screenshots/chat.png" width=300> |
| 96 | +</p> |
| 97 | + |
| 98 | +> [!TIP] |
| 99 | +> Helpful ExecuTorch API in app |
| 100 | +```java |
| 101 | +mModule.generate(prompt,sequence_length, MainActivity.this); |
94 | 102 | ```
|
| 103 | +* `prompt`: User formatted prompt |
| 104 | +* `sequence_length`: Number of tokens to generate in response to a prompt |
| 105 | +* `MainActivity.this`: Indicate that the callback functions (OnResult(), OnStats()) are present in this class. |
95 | 106 |
|
96 |
| -On the phone or emulator, you can try running the model: |
97 |
| -<img src="../_static/img/android_llama_app.png" alt="Android LLaMA App" /><br> |
| 107 | +[*LLaVA-1.5: Only for XNNPACK delegate*] |
98 | 108 |
|
99 |
| -## Takeaways |
100 |
| -Through this tutorial we've learnt how to build the ExecuTorch LLAMA library, and expose it to JNI layer to build the Android app. |
| 109 | +For LLaVA-1.5 implementation, select the exported LLaVA .pte and tokenizer file in the Settings menu and load the model. After this you can send an image from your gallery or take a live picture along with a text prompt to the model. |
| 110 | + |
| 111 | +<p align="center"> |
| 112 | +<img src="docs/screenshots/llava_example.png" width=300> |
| 113 | +</p> |
| 114 | + |
| 115 | + |
| 116 | +### Output Generated |
| 117 | +To show completion of the follow-up question, here is the complete detailed response from the model. |
| 118 | +<p align="center"> |
| 119 | +<img src="docs/screenshots/chat_response.png" width=300> |
| 120 | +</p> |
| 121 | + |
| 122 | +> [!TIP] |
| 123 | +> Helpful ExecuTorch API in app |
| 124 | +
|
| 125 | +Ensure you have the following functions in your callback class that you provided in the `mModule.generate()`. For this example, it is `MainActivity.this`. |
| 126 | +```java |
| 127 | + @Override |
| 128 | + public void onResult(String result) { |
| 129 | + //...result contains token from response |
| 130 | + //.. onResult will continue to be invoked until response is complete |
| 131 | + } |
| 132 | + |
| 133 | + @Override |
| 134 | + public void onStats(float tps) { |
| 135 | + //...tps (tokens per second) stats is provided by framework |
| 136 | + } |
| 137 | + |
| 138 | +``` |
101 | 139 |
|
102 | 140 | ## Reporting Issues
|
103 | 141 | If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).
|
0 commit comments