RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device

Hi, 
   Just test on RISC-V board: 
   4xC910 2.0G TH1520 LicheePi4A (https://sipeed.com/licheepi4a)  with 16GB LPDDR4X.
   about 6s/token without any instruction acceleration, and it should be <5s/token when boost to 2.5GHz.

```
llama_model_load: ggml ctx size = 668.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | 

main: prompt: 'They'
main: number of tokens in prompt = 2
     1 -> ''
 15597 -> 'They'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000


They are now available for sale at the cost of Rs 20,5

main: mem per token = 14368644 bytes
main:     load time =    91.25 ms
main:   sample time =    39.22 ms
main:  predict time = 105365.27 ms / 6197.96 ms per token
main:    total time = 129801.62 ms

```

   1xC906 1.0G D1 LicheeRV with 1GB DDR3.
   about 180s/token without any instruction acceleration, it is very slow due to lack of memory.
```
main: mem per token = 14368644 bytes
main:     load time =  1412.77 ms
main:   sample time =   185.77 ms
main:  predict time = 3171739.00 ms / 186572.88 ms per token
main:    total time = 3609667.50 ms
```
   
   Note the ggml ctx size is 668MB, not 4668MB, I hack the code for low memory(>=512MB) device to run llama, and it is not use swap memory, as regard sd card as memory will demage sd card soon. 
   Should this feature need add in?

   And here is a time-lapse photography for D1 run llama 7B model, it is super slow even in 120X speedup, but it works!   



https://user-images.githubusercontent.com/3403712/226168660-a0e9c775-edf7-4895-9b2b-b6addcf7868e.mp4



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RISC-V (TH1520&D1) benchmark and hack for <1GB DDR device #288

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions