Bug when prompt stored in `--prompt-cache` is longer than the new one

Edit: @DannyDaemonic here. (Sorry for the edit.) Here's a much simpler example. First build the cache like this:
```
./main -m /path/to/llama/bin --prompt-cache Z.cache -p "Here's a funny joke. No, wait, here's a bunch of Zs: Z Z Z Z Z Z Z Z Z Z"
```
Then try this:
```
./main -m /path/to/llama/bin --prompt-cache Z.cache -p "Here's a funny joke."
```

The joke will start with a `Z` every time. Perhaps the logits are not being reevaluated for some reason. Changing one token, even the very last, seems to work around the bug. The fix is to recalculate the logits.

---

What happens when a prompt stored in `--prompt-cache` is "longer" than the current one?

I want to make a wrapper around `main.exe`, but this behavior looks strange and buggy. I don't know whether it is a bug or not, but it looks very-very confusing. I may post it as a separate issue if that's really a bug that you cannot solve in this PR.

So, here are my steps: (on version ee9654138ab0ae5f138f4abddf56ca234ea3c352 and [this model](https://huggingface.co/TheBloke/WizardLM-7B-uncensored-GGML))

`main.exe -t 6 -m WizardLM-7B-uncensored.ggml.q5_1.bin -c 512 --temp 0 --repeat_penalty 1.2 --prompt-cache cache1 -f in.txt >out.txt`
(note the cache file and zero temperature)

My prompt inside `in.txt` is this:

```
### Instruction:

Describe your last dream in a few words.

### Response:

My
```

The model outputs this text inside `out.txt` (with an extra space before the first line, but let's assume I stripped it manually):

```
### Instruction:

Describe your last dream in a few words.

### Response:

My last dream was about being chased by a group of people through a forest, but I eventually found safety inside a church.
```

It also creates prompt-cache file `cache1` with size around 13 Mb, and writes to stderr:
```
main: attempting to load saved session from 'cache1'
main: session file does not exist, will create
```

If I repeat the same command as-is – it recreates the same text and does not update the cache file, with stderr being:
```
main: attempting to load saved session from 'cache1'
main: loaded a session with prompt size of 26 tokens
main: session file has exact match for prompt!
```

Then I copy "cache1" file to `cache2` file. I also put the resulting text back into `in.txt`, this time cutting it after the last comma, so it becomes:

```
### Instruction:

Describe your last dream in a few words.

### Response:

My last dream was about being chased by a group of people through a forest,
```

Then I run, pointing to `cache2`: (which is the same as cache1 for now)
`main.exe -t 6 -m WizardLM-7B-uncensored.ggml.q5_1.bin -c 512 --temp 0 --repeat_penalty 1.2 --prompt-cache cache2 -f in.txt >out.txt`

It gives the exact same line, continuing with `but I eventually found safety inside a church.` and stopping as before.
But this time, `cache2` is updated to 21 Mb. So far so good!
Its stderr said:
```
main: attempting to load saved session from 'cache2'
main: loaded a session with prompt size of 26 tokens
main: session file matches 26 / 42 tokens of prompt
```

Finally, I copy cache2 to `cache3` and cut the prompt back to `My` just as my very first in.txt contents.
I run, pointing to cache3, and get the following:

```
### Instruction:

Describe your last dream in a few words.

### Response:

My butterfly had wings of fire and flew through the night sky while I chased after it, trying to catch it so that we could be together forever.
```

File `cache3` stays binary equal to cache2, and the program outputs to stderr this lines among the others:
```
main: attempting to load saved session from 'cache3'
main: loaded a session with prompt size of 42 tokens
main: session file has exact match for prompt!
```

What just happened? For me, it looks like the program compared for the head of the prompt discarding its cached tail; then it assumed that the cache is valid (despite it is not) and continued the generation from the cache, ending up in a broken state.

My questions:
1) Is it possible to "go back" for any amounts of tokens in the cache? Can we cut the prompt in arbitrary place and keep the cache valid? Or is it only valid "after the final token", so we can go forth but cannot start earlier?
2) If we cannot go back, is it possible to "restore text" from the cache and output it explicitly? So the program would show, what we have in the cache, and what is causing its weird answers.
3) If yes, can we store "checkpoints" in the cache with full intermediate state, so that in my example after seeing "My" at the end – it would use the older state that it already had; or if I'll stop at "My last dream" then it continues up to "was about being chased by a group of people through a forest," (since that state was also recorded) and not break there?
4) If yes, why can't we "discard all of extra states" upon seeking, for example, "My last dream was dark" – backtrack to "My" (which is stored), then evaluate just "last dream was dark" and generate from here? Just as If I put a copy of an older cache myself.
5) Can't it at least throw an error with mismatching cache (or discard it and regenerate from scratch) when it detects that the prompt is different?
6) Can we specify "do not update cache" command-line option? Or "write new cache to the specified file" (which will also accept NUL to skip it) so it will not destroy the previous state? Currently my only option is to copy the cache file each time before invoking the program, which delays startup because of OS filesystem cache eviction when the model barely fits in RAM already.

And a few minor things:
– Why not to run the evaluation with `--n-predict 0` if I only want to cache the prompt? I have to specify `--n-predict 1` to just cache the prompt, but that prevents me to use (probably reparsed?) output file since it will contain an extra word at the end. I have to use the initial file.
– Why to print an extra space at the beginning of stdout? If I swap files (giving out.txt to in.txt) later, it will grow each time (two spaces, three spaces…) and most likely destroys the cache. I had to strip that manually, but this feels odd.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug when prompt stored in `--prompt-cache` is longer than the new one #1585

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug when prompt stored in --prompt-cache is longer than the new one #1585

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Bug when prompt stored in `--prompt-cache` is longer than the new one #1585