-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Closed as not planned
Description
Just did a very simple run with llama-7b-4bit. It... took a while. Had it run in a screen. But, it worked!
root@FriendlyWrt /s/o/llama.cpp (master)# time ./main --color -m models/ggml-model-q4_0.bin -p "Hello there!"
main: seed = 1680443840
llama_model_load: loading model from 'models/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx = 512
llama_model_load: n_embd = 4096
llama_model_load: n_mult = 256
llama_model_load: n_head = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot = 128
llama_model_load: f16 = 2
llama_model_load: n_ff = 11008
llama_model_load: n_parts = 1
llama_model_load: type = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size = 81.25 KB
llama_model_load: mem required = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from 'models/ggml-model-q4_0.bin'
llama_model_load: model size = 4017.27 MB / num tensors = 291
llama_init_from_file: kv self size = 256.00 MB
system_info: n_threads = 8 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0
Hello there! I am a freelance illustrator based in New Zealand. I grew up with an appreciation for the natural world, which has inspired me to create my work through observation and playful experimentation.
My focus is on watercolour painting (in particular), as well as digital art & animation. My style is bright & bold, vibrant, dynamic & colourful - I love animals!
I am always keen to collaborate with other artists/creatives, so if you are interested in working together please feel free to drop me a line. [end of text]
llama_print_timings: load time = 93487.23 ms
llama_print_timings: sample time = 704.72 ms / 115 runs ( 6.13 ms per run)
llama_print_timings: prompt eval time = 92466.10 ms / 4 tokens (23116.52 ms per token)
llama_print_timings: eval time = 11195694.23 ms / 114 runs (98207.84 ms per run)
llama_print_timings: total time = 11289895.19 ms
________________________________________________________
Executed in 188.18 mins fish external
usr time 324.60 mins 0.00 millis 324.60 mins
sys time 11.70 mins 1.70 millis 11.70 mins
Model was loaded from external microSD via internal bus.
Im quite amazed this worked at all, honestly.
CPU Info in detail:
# lscpu
Architecture: aarch64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Vendor ID: ARM
Model name: Cortex-A55
Model: 0
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: r2p0
CPU(s) scaling MHz: 100%
CPU max MHz: 1800.0000
CPU min MHz: 408.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Model name: Cortex-A76
Model: 0
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
Stepping: r4p0
CPU(s) scaling MHz: 68%
CPU max MHz: 2352.0000
CPU min MHz: 408.0000
BogoMIPS: 48.00
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
Caches (sum of all):
L1d: 384 KiB (8 instances)
L1i: 384 KiB (8 instances)
L2: 2.5 MiB (8 instances)
L3: 3 MiB (1 instance)
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; __user pointer sanitization
Spectre v2: Vulnerable: Unprivileged eBPF enabled
Srbds: Not affected
Tsx async abort: Not affected
(/proc/cpuinfo doesnt give any more useful details here, sadly.)
Hardware is a FriendlyElec NanoPi R6s
krasin, noah003, av1d, furyhawk, charles-cai and 1 moreGreen-Sky, lin72h, noah003, DavidGoedicke and charles-cai
Metadata
Metadata
Assignees
Labels
No labels