Project status #3471

ggerganov · 2023-10-04T13:49:51Z

ggerganov
Oct 4, 2023
Maintainer

[NO LONGER UPDATED]

Below is a summary of the functionality provided by the llama.cpp project.

The goal is to have a birds-eye-view of what works and what does not
Collaborators are encouraged to add things to the list and update the status of existing things as needed
The list should be simple without too much details about the specific problems - these belong to dedicated issues

Legend (feel free to update):

✅ - Working correctly
☁️ - Partially working
❌ - Failing
❓ - Status unknown (needs testing)
🔬 - Under investigation
🚧 - Currently in development

Feature	Executable	Status	Issues
Inference
Single-batch decoding	`main`, `simple`	✅
Parallel / batched decoding	`batched`	✅
Continuous batching	`parallel`	✅
Speculative sampling	`speculative`	✅
Tree-based speculative sampling	`speculative`	✅
Self-speculative sampling	`speculative`	🚧	#3565
Lookahead sampling	`lookahead`	✅
Infill	`infill`	✅
REST API	`server`	✅
Embeddings	`embedding`	✅
Grouped Query Attention CPU	`main`	✅
Grouped Query Attention CUDA	`main`	✅
Grouped Query Attention OpenCL	`main`	✅
Grouped Query Attention Metal	`main`	✅
Session load / save	`main`	✅
K-quants (256) CUDA	`main`	✅
K-quants (64) CUDA	`main`	✅
K-quants (256) Metal	`main`	✅
K-quants (64) Metal	`main`	☁️	#3276
Special tokens	`main`	✅
Grammar sampling	`main`, `server`	✅
Beam search	`beam-search`	❓	#3471 (comment)
LoRA	`main`	☁️	#3333 #3519
SPM tokenizer	`test-tokenizer-0-llama`	✅
BPE tokenizer	`test-tokenizer-0-falcon`	✅
Models
LLaMA v1	`main`	✅
LLaMA v2	`main`	✅
Falcon	`main`	✅
StarCoder	`main`	✅
Baichuan	`main`	✅
MPT	`main`	✅
Persimmon	`main`	✅
LLaVA	`llava`	✅
Refact	`main`	✅
Bloom	`main`	✅
StableLM-3b-4e1t	`main`	✅
Training
Finetuning CPU	`finetune`	✅
Finetuning Metal	`finetune`	🔬
Backends
CPU x64	`ggml`	✅
CPU Arm	`ggml`	✅
GPU CUDA	`ggml-cuda`	✅
GPU ROCm	`ggml-cuda`	✅
GPU Metal	`ggml-metal`	✅
GPU OpenCL	`ggml-opencl`	✅
GPU Vulkan	`ggml-vulkan`	🚧	#2059

niansa · 2023-10-04T18:45:37Z

niansa
Oct 4, 2023

What does the "☁️" mean?

2 replies

shibe2 Oct 4, 2023

I don't know what the icon means, but current status of OpenCL back-end is: it works with supported models, but is buggy and perhaps, slower than it could be.

ggerganov Oct 4, 2023
Maintainer Author

Yup, this was my impression from reading a few issues lately. If you think it's not the case, feel free to update it. I just haven't set up OpenCL in my environment and cannot do tests atm

ScarletEmerald · 2023-10-05T04:07:09Z

ScarletEmerald
Oct 5, 2023

So "Parallel decoding" is done by batched and "Continuous batching" is done by parallel? Are these reversed?

1 reply

ggerganov Oct 5, 2023
Maintainer Author

Parallel decoding is also called "batched decoding" hence batched. The parallel example demonstrates a basic server that serves clients in parallel - it just happens to have the continuous batching feature as an option.

Naming things is hard :) Sorry if these are confusing

slaren · 2023-10-07T22:37:00Z

slaren
Oct 7, 2023
Maintainer

Should beam search be added here? I think it is broken atm, at least with CUDA.

4 replies

ggerganov Oct 8, 2023
Maintainer Author

Yes, it should be added. The list is far from complete

Mihaiii Oct 8, 2023

Fwiw, for me beam search is broken even without CUDA in a sense that when I run the example, nothing happens (it just hangs for minutes at this line until I CTRL+C it).

If it's an unknown problem, I'll open an issue (tbh, it's strange that nobody mentioned it before so maybe I'm doing something wrong).

Update: when it hangs on the above mentioned line, I have 0 hard page fauls/sec.

slaren Oct 8, 2023
Maintainer

With CUDA it works for a while, but then it starts generating gibberish. I think that the calls to llama_decode are failing and it is not catching it. It's probably missing some KV cache management after the batched decoding change.

ggerganov Oct 18, 2023
Maintainer Author

The beam search functionality should be moved out from the library and implemented as a standalone example.

shibe2 · 2023-10-18T19:41:25Z

shibe2
Oct 18, 2023

What would be criteria for considering OpenCL back-end working correctly? I've fixed all known bugs in ggml-opencl.cpp and now working on refactoring like #3669.

3 replies

ggerganov Oct 18, 2023
Maintainer Author

The criteria is that if it runs correctly on your machine, then it is ✅ until someone reports a problem that is reproducible - then it becomes ☁️ or ❌ depending on how broken the thing is

shibe2 Oct 18, 2023

Alright, turning the green light then!

Yossef-Dawoad Nov 27, 2023

maybe you can ditch the icons for something Like Scoring Like (A+, A, A-, B, ...) this will make it obvious if something working fine but needs improvements has a score with A- and so on, maybe something like this :
[ A+ ] or [ A ] : working like charm
[ A- ] : Working correctly but needs improvement
[ B ] : Partially working
[ B- ] : Partially working with big issues to be resolved
[ C ] : Status unknown (needs testing)
[ D+ ] : Under investigation
[ D ] : Currently in development
[ F ] : Failing

maybe you should add a column for tier support for example, if a feature is tier 1 or 2, ... what do you think?

pareek-ml · 2025-04-15T22:29:05Z

pareek-ml
Apr 15, 2025

Is there any further progress on Finetuning on metal gpu?

0 replies

Project status #3471

Uh oh!

Uh oh!

ggerganov Oct 4, 2023 Maintainer

[NO LONGER UPDATED]

Replies: 5 comments · 10 replies

Uh oh!

Uh oh!

Uh oh!

ggerganov Oct 4, 2023 Maintainer Author

Uh oh!

Uh oh!

ggerganov Oct 5, 2023 Maintainer Author

Uh oh!

slaren Oct 7, 2023 Maintainer

Uh oh!

ggerganov Oct 8, 2023 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

slaren Oct 8, 2023 Maintainer

Uh oh!

ggerganov Oct 18, 2023 Maintainer Author

Uh oh!

Uh oh!

ggerganov Oct 18, 2023 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

ggerganov
Oct 4, 2023
Maintainer

Replies: 5 comments 10 replies

ggerganov Oct 4, 2023
Maintainer Author

ggerganov Oct 5, 2023
Maintainer Author

slaren
Oct 7, 2023
Maintainer

ggerganov Oct 8, 2023
Maintainer Author

slaren Oct 8, 2023
Maintainer

ggerganov Oct 18, 2023
Maintainer Author

ggerganov Oct 18, 2023
Maintainer Author