Skip to content

GGUF converted model won't inference when --instruct is set. #2741

@Rotatingxenomorph

Description

@Rotatingxenomorph

Everything works fine with the pre-GGUF llama.cpp. Converted the ggml to gguf and it runs fine without --instruct but not with.

Expected Behavior

GGUF converted llama-2-70b-chat.gguf.q6_K.bin working with --instruct

Current Behavior

Doesn't inference.

Environment and Context

Llama.cpp Windows avx2 https://github.com/ggerganov/llama.cpp/releases/download/master-8207214/llama-master-8207214-bin-win-avx2-x64.zip (main: build = 1033 (8207214)) windows 10 powershell

./main -t 5 -m llama-2-70b-chat.gguf.q6_K.bin --instruct main: build = 1033 (8207214)
main: seed = 1692794398
llama_model_loader: loaded meta data with 15 key-value pairs and 723 tensors from K:\aimodels\llama-2-70b-chat.gguf.q6_K.bin (ve�ֱjllama_model_loader: - tensor 0: token_embd.weight q6_K [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 1: output_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 2: output.weight q6_K [ 8192, 32000, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 10: blk.0.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 11: blk.0.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 19: blk.1.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 20: blk.1.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 28: blk.2.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 29: blk.2.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.ffn_gate.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_down.weight q6_K [ 28672, 8192, 1, 1 ]
llama_model_loader: - tensor 37: blk.3.ffn_up.weight q6_K [ 8192, 28672, 1, 1 ]
llama_model_loader: - tensor 38: blk.3.ffn_norm.weight f32 [ 8192, 1, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_q.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_k.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.attn_v.weight q6_K [ 8192, 1024, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.attn_output.weight q6_K [ 8192, 8192, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.attn_norm.weight f32 [ 8192, 1, 1, 1 ]
etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions