Rockchip RK3588 perf

Just did a very simple run with llama-7b-4bit. It... took a while. Had it run in a screen. But, it worked!

```
root@FriendlyWrt /s/o/llama.cpp (master)# time ./main --color -m models/ggml-model-q4_0.bin -p "Hello there!"
main: seed = 1680443840
llama_model_load: loading model from 'models/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0


 Hello there! I am a freelance illustrator based in New Zealand. I grew up with an appreciation for the natural world, which has inspired me to create my work through observation and playful experimentation.
My focus is on watercolour painting (in particular), as well as digital art & animation. My style is bright & bold, vibrant, dynamic & colourful - I love animals!
I am always keen to collaborate with other artists/creatives, so if you are interested in working together please feel free to drop me a line. [end of text]

llama_print_timings:        load time = 93487.23 ms
llama_print_timings:      sample time =   704.72 ms /   115 runs   (    6.13 ms per run)
llama_print_timings: prompt eval time = 92466.10 ms /     4 tokens (23116.52 ms per token)
llama_print_timings:        eval time = 11195694.23 ms /   114 runs   (98207.84 ms per run)
llama_print_timings:       total time = 11289895.19 ms

________________________________________________________
Executed in  188.18 mins    fish           external
   usr time  324.60 mins    0.00 millis  324.60 mins
   sys time   11.70 mins    1.70 millis   11.70 mins
```

Model was loaded from external microSD via internal bus.

Im quite amazed this worked at all, honestly.

CPU Info in detail:
```
# lscpu
Architecture:           aarch64
  CPU op-mode(s):       32-bit, 64-bit
  Byte Order:           Little Endian
CPU(s):                 8
  On-line CPU(s) list:  0-7
Vendor ID:              ARM
  Model name:           Cortex-A55
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 4
    Socket(s):          1
    Stepping:           r2p0
    CPU(s) scaling MHz: 100%
    CPU max MHz:        1800.0000
    CPU min MHz:        408.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
  Model name:           Cortex-A76
    Model:              0
    Thread(s) per core: 1
    Core(s) per socket: 2
    Socket(s):          2
    Stepping:           r4p0
    CPU(s) scaling MHz: 68%
    CPU max MHz:        2352.0000
    CPU min MHz:        408.0000
    BogoMIPS:           48.00
    Flags:              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
  L1d:                  384 KiB (8 instances)
  L1i:                  384 KiB (8 instances)
  L2:                   2.5 MiB (8 instances)
  L3:                   3 MiB (1 instance)
Vulnerabilities:
  Itlb multihit:        Not affected
  L1tf:                 Not affected
  Mds:                  Not affected
  Meltdown:             Not affected
  Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:           Mitigation; __user pointer sanitization
  Spectre v2:           Vulnerable: Unprivileged eBPF enabled
  Srbds:                Not affected
  Tsx async abort:      Not affected
```
(`/proc/cpuinfo` doesnt give any more useful details here, sadly.)

Hardware is a [FriendlyElec NanoPi R6s](https://www.friendlyelec.com/index.php?route=product/product&product_id=289)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rockchip RK3588 perf #722

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Rockchip RK3588 perf #722

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions