-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Expected Behavior
./simple.cpp with TheBloke's Llama-2-7b-Chat-GGUF should run without issue.
Current Behavior
./simple ~/.cache/huggingface/hub/models--TheBloke--Llama-2-7b-Chat-GGUF/blobs/08a5566d61d7cb6b420c3e4387a39e0078e1f2fe5f055f3a03887385304d4bfa
(https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF)
results in
Hello my name isSegmentation fault (core dumped)
The model works fine with main.
I'm running ubuntu latest with everything up to date. compiled with make (no cuda, etc.).
The line that fails is
llama.cpp: 1453 (
llama_kv_cache_find_slot)cache.cells[cache.head + i].seq_id.insert(batch.seq_id[i][j]);
The initilization of llama_batch::seq_id in simple.cpp seems suspect - but I'm not nearly knowlegeable about what seq_id should be to fix it.
llama_batch batch = llama_batch_init(512, 0, 1);
// evaluate the initial prompt
batch.n_tokens = tokens_list.size();
for (int32_t i = 0; i < batch.n_tokens; i++) {
batch.token[i] = tokens_list[i];
batch.pos[i] = i;
batch.seq_id[i] = 0;
batch.logits[i] = false;
}
// llama_decode will output logits only for the last token of the prompt
batch.logits[batch.n_tokens - 1] = true;Time permitting I may take a stab at porting whatever seems to be working for main over.
KerfuffleV2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working