-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
Everything works fine with the pre-GGUF llama.cpp. Converted the ggml to gguf and it runs fine without --instruct but not with.
Expected Behavior
GGUF converted llama-2-70b-chat.gguf.q6_K.bin working with --instruct
Current Behavior
Doesn't inference.
Environment and Context
Llama.cpp Windows avx2 https://github.com/ggerganov/llama.cpp/releases/download/master-8207214/llama-master-8207214-bin-win-avx2-x64.zip (main: build = 1033 (8207214)) windows 10 powershell
./main -t 5 -m llama-2-70b-chat.gguf.q6_K.bin --instruct main: build = 1033 (8207214)
main: seed = 1692794398
llama_model_loader: loaded meta data with 15 key-value pairs and 723 tensors from K:\aimodels\llama-2-70b-chat.gguf.q6_K.bin (ve�ֱjllama_model_loader: - tensor 0: token_embd.weight q6_K [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 1: output_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 2: output.weight q6_K [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 20: blk.1.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 29: blk.2.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 37: blk.3.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 38: blk.3.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
etc.