Eval bug: NVIDIA Nemotron Nano 9B v2 thinking tokens not properly handled in the llama-server web ui

### Name and Version

llama-server.exe --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 7900 XTX, gfx1100 (0x1100), VMM: no, Wave Size: 32
version: 6323 (c950ec62)
built with AMD clang version 20.0.0git (https://github.com/ROCm/llvm-project.git 1b5ca053c4ff3f9e729db16d11ca998bbd65d7e3+PATCHED:826b8a17847378a096dff258bf54fc237336f0e4) for x86_64-pc-windows-msvc


### Operating systems

Windows

### GGML backends

HIP

### Hardware

gfx1100

### Models

https://huggingface.co/bartowski/nvidia_NVIDIA-Nemotron-Nano-9B-v2-GGUF/tree/main : nvidia_NVIDIA-Nemotron-Nano-9B-v2-Q8_0.gguf

### Problem description & steps to reproduce

Just like https://github.com/ggml-org/llama.cpp/issues/11861 the <think> tag is part of the prompt template, the response only contains </think> which does not separate thinking from the output in the webgui.

Just use it with llama-server, see template output of llama.cpp\scripts\get_chat_template.py nvidia/NVIDIA-Nemotron-Nano-9B-v2 or the server messages.

### First Bad Commit

_No response_

### Relevant log output

```shell
llama-server.exe --jinja -b 4096 -fa -c 131072 -ngl 9999  --metrics -m llamacppmodels\nvidia_NVIDIA-Nemotron-Nano-9B-v2-Q8_0.gguf

(i gave --reasoning-format deepseek and manual template changes a try, wondering if I'm just holding it wrong)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval bug: NVIDIA Nemotron Nano 9B v2 thinking tokens not properly handled in the llama-server web ui #15673

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Eval bug: NVIDIA Nemotron Nano 9B v2 thinking tokens not properly handled in the llama-server web ui #15673

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions