You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+7-7
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ This will entail:
16
16
-[ ] Give GLaDOS vision via [LLaVA](https://llava-vl.github.io/)
17
17
-[ ] Create 3D-printable parts
18
18
-[ ] Design the animatronics system
19
-
19
+
20
20
21
21
22
22
## Software Architecture
@@ -25,8 +25,8 @@ The initial goals are to develop a low-latency platform, where GLaDOS can respon
25
25
To do this, the system constantly records data to a circular buffer, waiting for [voice to be detected](https://github.com/snakers4/silero-vad). When it's determined that the voice has stopped (including detection of normal pauses), it will be [transcribed quickly](https://github.com/huggingface/distil-whisper). This is then passed to streaming [local Large Language Model](https://github.com/ggerganov/llama.cpp), where the streamed text is broken by sentence, and passed to a [text-to-speech system](https://github.com/rhasspy/piper). This means further sentences can be generated while the current is playing, reducing latency substantially.
26
26
27
27
### Subgoals
28
-
- The other aim of the project is to minimize dependencies, so this can run on constrained hardware. That means no PyTorch or other large packages.
29
-
- As I want to fully understand the system, I have removed a large amount of redirection: which means extracting and rewriting code. i.e. as GLaDOS only speaks English, I have rewritten the wrapper around [espeak](https://espeak.sourceforge.net/) and the entire Text-to-Speech subsystem is about 500 LOC and has only 3 dependencies: numpy, onnxruntime, and sounddevice.
28
+
- The other aim of the project is to minimize dependencies, so this can run on constrained hardware. That means no PyTorch or other large packages.
29
+
- As I want to fully understand the system, I have removed a large amount of redirection: which means extracting and rewriting code. i.e. as GLaDOS only speaks English, I have rewritten the wrapper around [espeak](https://espeak.sourceforge.net/) and the entire Text-to-Speech subsystem is about 500 LOC and has only 3 dependencies: numpy, onnxruntime, and sounddevice.
30
30
31
31
## Hardware System
32
32
This will be based on servo- and stepper-motors. 3D printable STL will be provided to create GlaDOS's body, and she will be given a set of animations to express herself. The vision system will allow her to track and turn toward people and things of interest.
@@ -36,7 +36,7 @@ This will be based on servo- and stepper-motors. 3D printable STL will be provid
36
36
37
37
### *New Simplified Windows Installation Process*
38
38
Don't want to compile anything? Try this simplified process, but be aware it's still in the experimental stage!
39
-
39
+
40
40
41
41
1. Open the Microsoft Store, search for `python` and install Python 3.12.
42
42
a. To use Python 3.10, install `typing_extensions` and replace `import typing` in `glados/llama.py` with `import typing_extensions`.
@@ -65,7 +65,7 @@ If you are on Windows, I would recommend WSL with an Ubuntu image. Proper Windo
65
65
and put them in the ".models" directory.
66
66
4. For voice recognition, we use [Whisper.cpp](https://github.com/ggerganov/whisper.cpp)
67
67
1. You can either download the compiled [whisper.cpp DLLs](https://github.com/ggerganov/whisper.cpp/releases) (recommended for Windows), and copy the dll to the ./submodules/whisper.cpp directory
68
-
2. Or compile them yourself.
68
+
2. Or compile them yourself.
69
69
1. To pull the code, from the GLaDOS directory use: `git submodule update --init --recursive`
70
70
2. Move to the right subdirectory: `cd submodules/whisper.cpp`
71
71
3. Compile for your system [(see the Documentation)](https://github.com/ggerganov/whisper.cpp), e.g.
@@ -77,8 +77,8 @@ If you are on Windows, I would recommend WSL with an Ubuntu image. Proper Windo
77
77
1. Use: `git submodule update --init --recursive` to pull the llama.cpp repo
78
78
2. Move to the right subdirectory: `cd submodules/llama.cpp`
79
79
3. Compile llama.cpp, [(see the Documentation)](https://github.com/ggerganov/whisper.cpp)
80
-
1. Linux with [CUDA](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#cuda)`make server LLAMA_CUDA=1`
81
-
2. MacOS with [Metal](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#metal-build)`make server`
80
+
1. Linux with [CUDA](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#cuda)`make llama-server LLAMA_CUDA=1`
81
+
2. MacOS with [Metal](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#metal-build)`make llama-server`
82
82
2. Use a commercial API or install an inference backend yourself, such as Ollama or Llamafile:
83
83
1. Find and install a backend with an OpenAI compatible API (most of them)
0 commit comments