Skip to content

Commit ba8a1f9

Browse files
authored
examples : add README.md to tts example [no ci] (#11155)
* examples : add README.md to tts example [no ci] * squash! examples : add README.md to tts example [no ci] Fix heading to be consistent with other examples, and add a quickstart section to README.md. * squash! examples : add README.md to tts example [no ci] Fix spelling mistake.
1 parent ff3fcab commit ba8a1f9

File tree

1 file changed

+80
-0
lines changed

1 file changed

+80
-0
lines changed

examples/tts/README.md

+80
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# llama.cpp/example/tts
2+
This example demonstrates the Text To Speech feature. It uses a
3+
[model](https://www.outeai.com/blog/outetts-0.2-500m) from
4+
[outeai](https://www.outeai.com/).
5+
6+
## Quickstart
7+
If you have built llama.cpp with `-DLLAMA_CURL=ON` you can simply run the
8+
following command and the required models will be downloaded automatically:
9+
```console
10+
$ build/bin/llama-tts --tts-oute-default -p "Hello world" && aplay output.wav
11+
```
12+
For details about the models and how to convert them to the required format
13+
see the following sections.
14+
15+
### Model conversion
16+
Checkout or download the model that contains the LLM model:
17+
```console
18+
$ pushd models
19+
$ git clone --branch main --single-branch --depth 1 https://huggingface.co/OuteAI/OuteTTS-0.2-500M
20+
$ cd OuteTTS-0.2-500M && git lfs install && git lfs pull
21+
$ popd
22+
```
23+
Convert the model to .gguf format:
24+
```console
25+
(venv) python convert_hf_to_gguf.py models/OuteTTS-0.2-500M \
26+
--outfile models/outetts-0.2-0.5B-f16.gguf --outtype f16
27+
```
28+
The generated model will be `models/outetts-0.2-0.5B-f16.gguf`.
29+
30+
We can optionally quantize this to Q8_0 using the following command:
31+
```console
32+
$ build/bin/llama-quantize models/outetts-0.2-0.5B-f16.gguf \
33+
models/outetts-0.2-0.5B-q8_0.gguf q8_0
34+
```
35+
The quantized model will be `models/outetts-0.2-0.5B-q8_0.gguf`.
36+
37+
Next we do something simlar for the audio decoder. First download or checkout
38+
the model for the voice decoder:
39+
```console
40+
$ pushd models
41+
$ git clone --branch main --single-branch --depth 1 https://huggingface.co/novateur/WavTokenizer-large-speech-75token
42+
$ cd WavTokenizer-large-speech-75token && git lfs install && git lfs pull
43+
$ popd
44+
```
45+
This model file is PyTorch checkpoint (.ckpt) and we first need to convert it to
46+
huggingface format:
47+
```console
48+
(venv) python examples/tts/convert_pt_to_hf.py \
49+
models/WavTokenizer-large-speech-75token/wavtokenizer_large_speech_320_24k.ckpt
50+
...
51+
Model has been successfully converted and saved to models/WavTokenizer-large-speech-75token/model.safetensors
52+
Metadata has been saved to models/WavTokenizer-large-speech-75token/index.json
53+
Config has been saved to models/WavTokenizer-large-speech-75tokenconfig.json
54+
```
55+
Then we can convert the huggingface format to gguf:
56+
```console
57+
(venv) python convert_hf_to_gguf.py models/WavTokenizer-large-speech-75token \
58+
--outfile models/wavtokenizer-large-75-f16.gguf --outtype f16
59+
...
60+
INFO:hf-to-gguf:Model successfully exported to models/wavtokenizer-large-75-f16.gguf
61+
```
62+
63+
### Running the example
64+
65+
With both of the models generated, the LLM model and the voice decoder model,
66+
we can run the example:
67+
```console
68+
$ build/bin/llama-tts -m ./models/outetts-0.2-0.5B-q8_0.gguf \
69+
-mv ./models/wavtokenizer-large-75-f16.gguf \
70+
-p "Hello world"
71+
...
72+
main: audio written to file 'output.wav'
73+
```
74+
The output.wav file will contain the audio of the prompt. This can be heard
75+
by playing the file with a media player. On Linux the following command will
76+
play the audio:
77+
```console
78+
$ aplay output.wav
79+
```
80+

0 commit comments

Comments
 (0)