Skip to content

Commit 74a56e4

Browse files
cmodi-metafacebook-github-bot
authored andcommitted
Update Android ExecuTorch Llama demo app readme docs (#5364)
Summary: * Revamp and standardize the readme docs structure for better clarity. * Added XNNPACK and MTK delegate instructions for Android. * Added app screenshots for user instruction. * Added a `docs` directory to host README materials Pull Request resolved: #5364 Reviewed By: larryliu0820 Differential Revision: D62667589 Pulled By: cmodi-meta fbshipit-source-id: 024d989e802886f21e0df2d561bd8aa49b1ab90e
1 parent 262dfc0 commit 74a56e4

12 files changed

+434
-83
lines changed
Lines changed: 121 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,103 +1,141 @@
1-
# Building ExecuTorch LLaMA Android Demo App
2-
3-
This app demonstrates the use of the LLaMA chat app demonstrating local inference use case with ExecuTorch.
4-
5-
## Prerequisites
6-
* Set up your ExecuTorch repo and environment if you haven’t done so by following the [Setting up ExecuTorch](https://pytorch.org/executorch/stable/getting-started-setup) to set up the repo and dev environment.
7-
* Install [Java 17 JDK](https://www.oracle.com/java/technologies/javase/jdk17-archive-downloads.html).
8-
* Install the [Android SDK API Level 34](https://developer.android.com/about/versions/14/setup-sdk) and
9-
[Android NDK 25.0.8775105](https://developer.android.com/studio/projects/install-ndk).
10-
* If you have Android Studio set up, you can install them with
11-
* Android Studio Settings -> Language & Frameworks -> Android SDK -> SDK Platforms -> Check the row with API Level 34.
12-
* Android Studio Settings -> Language & Frameworks -> Android SDK -> SDK Tools -> Check NDK (Side by side) row.
13-
* Alternatively, you can follow [this guide](https://github.com/pytorch/executorch/blob/856e085b9344c8b0bf220a97976140a5b76356aa/examples/demo-apps/android/LlamaDemo/SDK.md) to set up Java/SDK/NDK with CLI.
14-
* Supported Host OS: CentOS, macOS Sonoma on Apple Silicon.
15-
16-
Note: This demo app and tutorial has only been validated with arm64-v8a [ABI](https://developer.android.com/ndk/guides/abis), with NDK 25.0.8775105.
17-
18-
## Getting models
19-
Please refer to the [ExecuTorch Llama2 docs](https://github.com/pytorch/executorch/blob/main/examples/models/llama2/README.md) to export the model.
20-
21-
After you export the model and generate tokenizer.bin, push them device:
22-
```bash
23-
adb shell mkdir -p /data/local/tmp/llama
24-
adb push llama2.pte /data/local/tmp/llama
25-
adb push tokenizer.bin /data/local/tmp/llama
26-
```
1+
# ExecuTorch Llama Android Demo App
272

28-
Note: The demo app searches in `/data/local/tmp/llama` for .pte and .bin files as LLAMA model and tokenizer.
3+
We’re excited to share that the newly revamped Android demo app is live and includes many new updates to provide a more intuitive and smoother user experience with a chat use case! The primary goal of this app is to showcase how easily ExecuTorch can be integrated into an Android demo app and how to exercise the many features ExecuTorch and Llama models have to offer.
294

30-
## Build library
31-
For the demo app to build, we need to build the ExecuTorch AAR library first.
5+
This app serves as a valuable resource to inspire your creativity and provide foundational code that you can customize and adapt for your particular use case.
326

33-
The AAR library contains the required Java package and the corresponding JNI
34-
library for using ExecuTorch in your Android app.
7+
Please dive in and start exploring our demo app today! We look forward to any feedback and are excited to see your innovative ideas.
358

36-
### Alternative 1: Use prebuilt AAR library (recommended)
379

38-
1. Open a terminal window and navigate to the root directory of the `executorch`.
39-
2. Run the following command to download the prebuilt library:
40-
```bash
41-
bash examples/demo-apps/android/LlamaDemo/download_prebuilt_lib.sh
42-
```
10+
## Key Concepts
11+
From this demo app, you will learn many key concepts such as:
12+
* How to prepare Llama models, build the ExecuTorch library, and model inferencing across delegates
13+
* Expose the ExecuTorch library via JNI layer
14+
* Familiarity with current ExecuTorch app-facing capabilities
4315

44-
The prebuilt AAR library contains the Java library and the JNI binding for
45-
NativePeer.java and ExecuTorch native library, including core ExecuTorch
46-
runtime libraries, XNNPACK backend, Portable kernels, Optimized kernels,
47-
and Quantized kernels. It comes with two ABI variants, arm64-v8a and x86_64.
16+
The goal is for you to see the type of support ExecuTorch provides and feel comfortable with leveraging it for your use cases.
4817

49-
If you want to use the prebuilt library for your own app, please refer to
50-
[Using Android prebuilt libraries (AAR)](./android-prebuilt-library.md) for
51-
tutorial.
18+
## Supporting Models
19+
As a whole, the models that this app supports are (varies by delegate):
20+
* Llama 3.1 8B
21+
* Llama 3 8B
22+
* Llama 2 7B
23+
* LLaVA-1.5 vision model (only XNNPACK)
5224

53-
If you need to use other dependencies (like tokenizer), please refer to
54-
Alternative 2: Build from local machine option.
5525

56-
### Alternative 2: Build from local machine
57-
1. Open a terminal window and navigate to the root directory of the `executorch`.
58-
2. Set the following environment variables:
59-
```bash
60-
export ANDROID_NDK=<path_to_android_ndk>
61-
export ANDROID_ABI=arm64-v8a
62-
```
63-
Note: `<path_to_android_ndk>` is the root for the NDK, which is usually under
64-
`~/Library/Android/sdk/ndk/XX.Y.ZZZZZ` for macOS, and contains NOTICE and README.md.
65-
We use `<path_to_android_ndk>/build/cmake/android.toolchain.cmake` for CMake to cross-compile.
66-
67-
3. Build the Android Java extension code:
68-
```bash
69-
pushd extension/android
70-
./gradlew build
71-
popd
72-
```
26+
## Building the APK
27+
First it’s important to note that currently ExecuTorch provides support across 3 delegates. Once you identify the delegate of your choice, select the README link to get a complete end-to-end instructions for environment set-up to exporting the models to build ExecuTorch libraries and apps to run on device:
7328

74-
4. Run the following command set up the required JNI library:
75-
```bash
76-
pushd examples/demo-apps/android/LlamaDemo
77-
./gradlew :app:setup
78-
popd
79-
```
80-
This is running the shell script [setup.sh](./setup.sh) which configures the required core ExecuTorch, LLAMA2, and Android libraries, builds them, and copy to jniLibs.
29+
| Delegate | Resource |
30+
| ------------- | ------------- |
31+
| XNNPACK (CPU-based library) | [link](docs/delegates/xnnpack_README.md) |
32+
| QNN (Qualcomm AI Accelerators) | Coming soon |
33+
| MediaTek (MediaTek AI Accelerators) | [link](docs/delegates/mediatek_README.md) |
34+
35+
## How to Use the App
8136

82-
## Build APK
83-
### Alternative 1: Android Studio (Recommended)
37+
This section will provide the main steps to use the app, along with a code snippet of the ExecuTorch API.
38+
39+
For loading the app, development, and running on device we recommend Android Studio:
8440
1. Open Android Studio and select "Open an existing Android Studio project" to open examples/demo-apps/android/LlamaDemo.
8541
2. Run the app (^R). This builds and launches the app on the phone.
8642

87-
### Alternative 2: Command line
88-
Without Android Studio UI, we can run gradle directly to build the app. We need to set up the Android SDK path and invoke gradle.
89-
```bash
90-
export ANDROID_HOME=<path_to_android_sdk_home>
91-
pushd examples/demo-apps/android/LlamaDemo
92-
./gradlew :app:installDebug
93-
popd
43+
### Opening the App
44+
45+
Below are the UI features for the app.
46+
47+
Select the settings widget to get started with picking a model, its parameters and any prompts.
48+
<p align="center">
49+
<img src="docs/screenshots/opening_the_app_details.png" width=800>
50+
</p>
51+
52+
53+
54+
### Select Models and Parameters
55+
56+
Once you've selected the model, tokenizer, and model type you are ready to click on "Load Model" to have the app load the model and go back to the main Chat activity.
57+
<p align="center">
58+
<img src="docs/screenshots/settings_menu.png" width=300>
59+
</p>
60+
61+
62+
63+
Optional Parameters:
64+
* Temperature: Defaulted to 0, you can adjust the temperature for the model as well. The model will reload upon any adjustments.
65+
* System Prompt: Without any formatting, you can enter in a system prompt. For example, "you are a travel assistant" or "give me a response in a few sentences".
66+
* User Prompt: More for the advanced user, if you would like to manually input a prompt then you can do so by modifying the `{{user prompt}}`. You can also modify the special tokens as well. Once changed then go back to the main Chat activity to send.
67+
68+
> [!TIP]
69+
> Helpful ExecuTorch API in app
70+
71+
```java
72+
// Upon returning to the Main Chat Activity
73+
mModule = new LlamaModule(
74+
ModelUtils.getModelCategory(mCurrentSettingsFields.getModelType()),
75+
modelPath,
76+
tokenizerPath,
77+
temperature);
78+
int loadResult = mModule.load();
79+
```
80+
81+
* `modelCategory`: Indicate whether it’s a text-only or vision model
82+
* `modePath`: path to the .pte file
83+
* `tokenizerPath`: path to the tokenizer .bin file
84+
* `temperature`: model parameter to adjust the randomness of the model’s output
85+
86+
87+
### User Prompt
88+
Once model is successfully loaded then enter any prompt and click the send (i.e. generate) button to send it to the model.
89+
<p align="center">
90+
<img src="docs/screenshots/load_complete_and_start_prompt.png" width=300>
91+
</p>
92+
93+
You can provide it more follow-up questions as well.
94+
<p align="center">
95+
<img src="docs/screenshots/chat.png" width=300>
96+
</p>
97+
98+
> [!TIP]
99+
> Helpful ExecuTorch API in app
100+
```java
101+
mModule.generate(prompt,sequence_length, MainActivity.this);
94102
```
103+
* `prompt`: User formatted prompt
104+
* `sequence_length`: Number of tokens to generate in response to a prompt
105+
* `MainActivity.this`: Indicate that the callback functions (OnResult(), OnStats()) are present in this class.
95106

96-
On the phone or emulator, you can try running the model:
97-
<img src="../_static/img/android_llama_app.png" alt="Android LLaMA App" /><br>
107+
[*LLaVA-1.5: Only for XNNPACK delegate*]
98108

99-
## Takeaways
100-
Through this tutorial we've learnt how to build the ExecuTorch LLAMA library, and expose it to JNI layer to build the Android app.
109+
For LLaVA-1.5 implementation, select the exported LLaVA .pte and tokenizer file in the Settings menu and load the model. After this you can send an image from your gallery or take a live picture along with a text prompt to the model.
110+
111+
<p align="center">
112+
<img src="docs/screenshots/llava_example.png" width=300>
113+
</p>
114+
115+
116+
### Output Generated
117+
To show completion of the follow-up question, here is the complete detailed response from the model.
118+
<p align="center">
119+
<img src="docs/screenshots/chat_response.png" width=300>
120+
</p>
121+
122+
> [!TIP]
123+
> Helpful ExecuTorch API in app
124+
125+
Ensure you have the following functions in your callback class that you provided in the `mModule.generate()`. For this example, it is `MainActivity.this`.
126+
```java
127+
@Override
128+
public void onResult(String result) {
129+
//...result contains token from response
130+
//.. onResult will continue to be invoked until response is complete
131+
}
132+
133+
@Override
134+
public void onStats(float tps) {
135+
//...tps (tokens per second) stats is provided by framework
136+
}
137+
138+
```
101139

102140
## Reporting Issues
103141
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# Building ExecuTorch Android Demo for Llama running MediaTek
2+
This tutorial covers the end to end workflow for running Llama 3-8B-instruct inference on MediaTek AI accelerators on an Android device.
3+
More specifically, it covers:
4+
1. Export and quantization of Llama models against the MediaTek backend.
5+
2. Building and linking libraries that are required to inference on-device for Android platform using MediaTek AI accelerators.
6+
3. Loading the needed files on the device and running inference.
7+
8+
Verified on MacOS, Linux CentOS (model export), Python 3.10, Android NDK 25.0.8775105
9+
Phone verified: MediaTek Dimensity 9300 (D9300) chip.
10+
11+
## Prerequisites
12+
* Download and link the Buck2 build, Android NDK, and MediaTek ExecuTorch Libraries from the MediaTek Backend Readme ([link](https://github.com/pytorch/executorch/tree/main/backends/mediatek/scripts#prerequisites)).
13+
* MediaTek Dimensity 9300 (D9300) chip device
14+
* Desired Llama 3 model weights. You can download them on HuggingFace [Example](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct)).
15+
* `libneuronusdk_adapter.mtk.so`, `libneuron_buffer_allocator.so`, and `.whl` files (will be available soon by MediaTek)
16+
17+
## Setup ExecuTorch
18+
In this section, we will need to set up the ExecuTorch repo first with Conda environment management. Make sure you have Conda available in your system (or follow the instructions to install it [here](https://anaconda.org/anaconda/conda)). The commands below are running on Linux (CentOS).
19+
20+
Create a Conda environment
21+
```
22+
conda create -yn et_mtk python=3.10.0
23+
conda activate et_mtk
24+
```
25+
26+
Checkout ExecuTorch repo and sync submodules
27+
```
28+
git clone https://github.com/pytorch/executorch.git
29+
cd executorch
30+
git submodule sync
31+
git submodule update --init
32+
```
33+
Install dependencies
34+
```
35+
./install_requirements.sh
36+
```
37+
## Setup Environment Variables
38+
### Download Buck2 and make executable
39+
* Download Buck2 from the official [Release Page](https://github.com/facebook/buck2/releases/tag/2024-02-01)
40+
* Create buck2 executable
41+
```
42+
zstd -cdq "<downloaded_buck2_file>.zst" > "<path_to_store_buck2>/buck2" && chmod +x "<path_to_store_buck2>/buck2"
43+
```
44+
45+
### MediaTek ExecuTorch Libraries
46+
The following libraries will be available soon by MediaTek:
47+
libneuronusdk_adapter.mtk.so: This universal SDK contains the implementation required for executing target-dependent code on the MediaTek chip.
48+
libneuron_buffer_allocator.so: This utility library is designed for allocating DMA buffers necessary for model inference.
49+
50+
### Set Environment Variables
51+
```
52+
export BUCK2=path_to_buck/buck2 # Download BUCK2 and create BUCK2 executable
53+
export ANDROID_NDK=path_to_android_ndk
54+
export NEURON_BUFFER_ALLOCATOR_LIB=path_to_buffer_allocator/libneuron_buffer_allocator.so
55+
```
56+
57+
## Build Backend and MTK Llama Runner
58+
Next we need to build and compile the MTK backend and MTK Llama runner.
59+
```
60+
cd examples/mediatek
61+
./mtk_build_examples.sh
62+
```
63+
64+
This will generate a cmake-android-out folder that will contain a runner executable for inferring with Llama models and another library file:
65+
* `cmake-android-out/examples/mediatek/mtk_llama_executor_runner`
66+
* `cmake-android-out/backends/mediatek/libneuron_backend.so`
67+
68+
## Export Llama Model
69+
MTK currently supports Llama 3 exporting.
70+
71+
### Set up Environment
72+
1. Follow the ExecuTorch set-up environment instructions found on the [Getting Started](https://pytorch.org/executorch/stable/getting-started-setup.html) page
73+
2. Set-up MTK AoT environment
74+
```
75+
// Ensure that you are inside executorch/examples/mediatek directory
76+
pip3 install -r requirements.txt
77+
78+
// The following .whl file will be available soon
79+
pip3 install mtk_neuron-8.2.2-py3-none-linux_x86_64.whl
80+
pip3 install mtk_converter-8.8.0.dev20240723+public.d1467db9-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
81+
```
82+
83+
This was tested with transformers version 4.40 and numpy version 1.23. If you do not have these version then, use the following commands:
84+
```
85+
pip install transformers==4.40
86+
87+
pip install numpy=1.23
88+
```
89+
90+
### Running Export
91+
Prior to exporting, place the config.json, relevant tokenizer files and .bin or .safetensor weight files in `examples/mediatek/models/llm_models/weights`.
92+
93+
Here is an export example ([details](https://github.com/pytorch/executorch/tree/main/examples/mediatek#aot-flow)):
94+
```
95+
cd examples/mediatek
96+
# num_chunks=4, num_tokens=128, cache_size=512
97+
source shell_scripts/export_llama.sh llama3 "" "" "" alpaca.txt
98+
```
99+
100+
There will be 3 main set of files generated:
101+
* num_chunks*2 pte files: half are for prompt and the other half are for generation. Generation pte files are denoted by “1t” in the file name.
102+
* Token embedding bin file: located in the weights folder where `config.json` is placed (`examples/mediatek/modes/llm_models/weight/<model_name>/embedding_<model_name>_fp32.bin`)
103+
* Tokenizer file: `tokenizer.model` file
104+
105+
Note: Exporting model flow can take 2.5 hours (114GB RAM for num_chunks=4) to complete. (Results may vary depending on hardware)
106+
107+
Before continuing forward, make sure to modify the tokenizer, token embedding, and model paths in the examples/mediatek/executor_runner/run_llama3_sample.sh.
108+
109+
## Deploy Files on Device
110+
111+
### Prepare to Deploy
112+
Prior to deploying the files on device, make sure to modify the tokenizer, token embedding, and model file names in examples/mediatek/executor_runner/run_llama3_sample.sh reflect what was generated during the Export Llama Model step.
113+
114+
<p align="center">
115+
<img src="../screenshots/mtk_changes_to_shell_file.png" width=600>
116+
</p>
117+
118+
In addition, create a sample_prompt.txt file with a prompt. This will be deployed to the device in the next step.
119+
* Example content of a sample_prompt.txt file:
120+
```
121+
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
122+
123+
You are a helpful AI assistant for travel tips and recommendations<|eot_id|><|start_header_id|>user<|end_header_id|>
124+
125+
What can you help me with?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
126+
```
127+
128+
### Deploy
129+
First, make sure your Android phone’s chipset version is compatible with this demo (MediaTek Dimensity 9300 (D9300)) chip. Once you have the model, tokenizer, and runner generated ready, you can push them and the .so files to the device before we start running using the runner via shell.
130+
131+
```
132+
adb shell mkdir -p /data/local/tmp/llama
133+
adb push examples/mediatek/executor_runner/run_llama3_sample.sh /data/local/tmp/llama
134+
adb push sample_prompt.txt /data/local/tmp/llama
135+
adb push cmake-android-out/examples/mediatek/mtk_llama_executor_runner /data/local/tmp/llama
136+
adb push cmake-android-out/backends/mediatek/libneuron_backend.so /data/local/tmp/llama
137+
adb push libneuron_buffer_allocator.so /data/local/tmp/llama
138+
adb push libneuronusdk_adapter.mtk.so /data/local/tmp/llama
139+
adb push embedding_<model_name>_fp32.bin /data/local/tmp/llama
140+
adb push tokenizer.model /data/local/tmp/llama
141+
```
142+
143+
## Run Demo
144+
At this point we have pushed all the required files on the device and we are ready to run the demo!
145+
```
146+
adb shell
147+
148+
<android_device>:/ $ cd data/local/tmp/llama
149+
<android_device>:/data/local/tmp/llama $ sh run_llama3_sample.sh
150+
```
151+
152+
<p align="center">
153+
<img src="../screenshots/mtk_output.png" width=800>
154+
</p>
155+
156+
## Reporting Issues
157+
If you encountered any bugs or issues following this tutorial please file a bug/issue here on [Github](https://github.com/pytorch/executorch/issues/new).

0 commit comments

Comments
 (0)