-
Notifications
You must be signed in to change notification settings - Fork 13.1k
Description
Name and Version
$./build/bin/llama-cli --version
version: 5358 (10d2af0)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
Other (Please specify in the next section)
Command line
CPU: ./build/bin/llama-finetune --file ./gsm8k_with_newlines_first_125.txt --model smollm2-135M-base.gguf -b 512 -ub 512 -np 1 --device none
CUDA: CUDA_VISIBLE_DEVICES=0 ./build/bin/llama-finetune --file ./gsm8k_with_newlines_first_125.txt --model smollm2-135M-base.gguf -b 512 -ub 512 -np 1 -ngl 999
Problem description & steps to reproduce
Hey all,
I've noticed that examples/training/finetune.cpp
code yields different and worse results when using the CPU backend.
As a minimal example to show this, I used finetune.cpp
in the following situation:
- Model: SmolLM2-135M Base (https://huggingface.co/HuggingFaceTB/SmolLM2-135M). This was converted to GGUF using
llama.cpp/convert_hf_to_gguf.py
specifyingf32
as the--outtype
. - Dataset: First 125 samples from GSM8K (https://huggingface.co/datasets/openai/gsm8k), stored in a single newline-delimited file (attached below). This small subset enables quick testing, since CPU training can be quite slow.
- Hyperparameters: Defaults used by the examples/training code - except the following:
- Epochs to train for: Modified from 2 -> 10
- Learning rate upped from 1e-7 -> 1e-6
I would expect SmolLM2-135M Base to quickly learn on (and maybe overfit) this small amount of data. Finetuning using the CUDA backend shows this is the case, with training loss decreasing over time.
I would expect similar behavior when finetuning using the CPU backend given that this is the same model, same dataset, and same hyperparameters. The only difference is the backend. However, this is showing that models finetuned with the CPU aren't learning.
I believe there's a bug in the CPU backend of the finetuning code specifically that is causing this discrepancy.
Results
I've attached the logfiles from running below:
llamacpp_finetune_smollm2135mbase_on_gsmk_125samples_cpu.log
llamacpp_finetune_smollm2135mbase_on_gsmk_125samples_gpu.log
I've attached charts showing this behavior below:


Data Used
I've attached the data used below:
gsm8k_with_newlines_first_125.txt
First Bad Commit
Commit 10d2af0