Closed
Description
Hi,
Just test on RISC-V board:
4xC910 2.0G TH1520 LicheePi4A (https://sipeed.com/licheepi4a) with 16GB LPDDR4X.
about 6s/token without any instruction acceleration, and it should be <5s/token when boost to 2.5GHz.
llama_model_load: ggml ctx size = 668.34 MB
llama_model_load: memory_size = 512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size = 4017.27 MB / num tensors = 291
system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: prompt: 'They'
main: number of tokens in prompt = 2
1 -> ''
15597 -> 'They'
sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000
They are now available for sale at the cost of Rs 20,5
main: mem per token = 14368644 bytes
main: load time = 91.25 ms
main: sample time = 39.22 ms
main: predict time = 105365.27 ms / 6197.96 ms per token
main: total time = 129801.62 ms
1xC906 1.0G D1 LicheeRV with 1GB DDR3.
about 180s/token without any instruction acceleration, it is very slow due to lack of memory.
main: mem per token = 14368644 bytes
main: load time = 1412.77 ms
main: sample time = 185.77 ms
main: predict time = 3171739.00 ms / 186572.88 ms per token
main: total time = 3609667.50 ms
Note the ggml ctx size is 668MB, not 4668MB, I hack the code for low memory(>=512MB) device to run llama, and it is not use swap memory, as regard sd card as memory will demage sd card soon.
Should this feature need add in?
And here is a time-lapse photography for D1 run llama 7B model, it is super slow even in 120X speedup, but it works!