Skip to content
Merged

Temp #26

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
b756441
metal : minor code formatting
ggerganov Nov 25, 2024
f6d12e7
tests : fix compile warning
ggerganov Nov 25, 2024
5931c1f
ggml : add support for dynamic loading of backends (#10469)
slaren Nov 25, 2024
9ca2e67
server : add speculative decoding support (#10455)
ggerganov Nov 25, 2024
a9a678a
Add download chat feature to server chat (#10481)
brucepro Nov 25, 2024
1f92225
Github: update issue templates [no ci] (#10489)
JohannesGaessler Nov 25, 2024
10bce04
llama : accept a list of devices to use to offload a model (#10497)
slaren Nov 25, 2024
80acb7b
Rename Olmo1124 to Olmo2 (#10500)
2015aroras Nov 25, 2024
106964e
metal : enable mat-vec kernels for bs <= 4 (#10491)
ggerganov Nov 25, 2024
47f931c
server : enable cache_prompt by default (#10501)
ggerganov Nov 25, 2024
9fd8c26
server : add more information about error (#10455)
ggerganov Nov 25, 2024
50d5cec
ci : build docker images only once daily (#10503)
slaren Nov 25, 2024
0cc6375
Introduce llama-run (#10291)
ericcurtin Nov 25, 2024
0eb4e12
vulkan: Fix a vulkan-shaders-gen arugment parsing error (#10484)
sparkleholic Nov 26, 2024
7066b4c
CANN: RoPE and CANCAT operator optimization (#10488)
noemotiovon Nov 26, 2024
9a4b79b
CANN: Improve the Inferencing Performance for Ascend NPU Device (#10454)
shen-shanshan Nov 26, 2024
811872a
speculative : simplify the implementation (#10504)
ggerganov Nov 26, 2024
84e1c33
server : fix parallel speculative decoding (#10513)
ggerganov Nov 26, 2024
25669aa
ggml-cpu: cmake add arm64 cpu feature check for macos (#10487)
chaxu01 Nov 26, 2024
c6807b3
ci : add ubuntu cuda build, build with one arch on windows (#10456)
slaren Nov 26, 2024
7db3846
ci : publish the docker images created during scheduled runs (#10515)
slaren Nov 26, 2024
ab96610
cmake : enable warnings in llama (#10474)
ggerganov Nov 26, 2024
0bbd226
restore the condistion to build & update pacakge when merge (#10507)
NeoZhangJianyu Nov 26, 2024
45abe0f
server : replace behave with pytest (#10416)
ngxson Nov 26, 2024
904109e
vulkan: fix group_norm (#10496)
jeffbolznv Nov 26, 2024
249cd93
mtgpu: Add MUSA_DOCKER_ARCH in Dockerfiles && update cmake and make (…
yeahdongcn Nov 26, 2024
be0e350
Fix HIP flag inconsistency & build docs (#10524)
tristandruyen Nov 26, 2024
30ec398
llama : disable warnings for 3rd party sha1 dependency (#10527)
slaren Nov 26, 2024
5a349f2
ci : remove nix workflows (#10526)
slaren Nov 26, 2024
de50973
Add OLMo 2 model in docs (#10530)
2015aroras Nov 26, 2024
c9b00a7
ci : fix cuda releases (#10532)
slaren Nov 26, 2024
4a57d36
vulkan: optimize Q2_K and Q3_K mul_mat_vec (#10459)
jeffbolznv Nov 27, 2024
71a6498
vulkan: skip integer div/mod in get_offsets for batch_idx==0 (#10506)
jeffbolznv Nov 27, 2024
249a790
vulkan: further optimize q5_k mul_mat_vec (#10479)
jeffbolznv Nov 27, 2024
5b3466b
vulkan: Handle GPUs with less shared memory (#10468)
jeffbolznv Nov 27, 2024
c31ed2a
vulkan: define all quant data structures in types.comp (#10440)
jeffbolznv Nov 27, 2024
9150f8f
Do not include arm_neon.h when compiling CUDA code (ggml/1028)
frankier Nov 26, 2024
fee824a
sync : ggml
ggerganov Nov 27, 2024
9e2301f
metal : fix group_norm support condition (#0)
ggerganov Nov 27, 2024
46c69e0
ci : faster CUDA toolkit installation method and use ccache (#10537)
slaren Nov 27, 2024
3ad5451
Add some minimal optimizations for CDNA (#10498)
IMbackK Nov 27, 2024
9f91251
common : fix duplicated file name with hf_repo and hf_file (#10550)
ngxson Nov 27, 2024
b742013
CANN: ROPE operator optimization (#10540)
noemotiovon Nov 28, 2024
605fa66
CANN: Fix SOC_TYPE compile bug (#10519)
leo-pony Nov 28, 2024
c6bc739
CANN: Update cann.md to display correctly in CLion (#10538)
HRXWEB Nov 28, 2024
2025fa6
kompute : improve backend to pass test_backend_ops (#10542)
slp Nov 28, 2024
c202cef
ggml-cpu: support IQ4_NL_4_4 by runtime repack (#10541)
FanShupei Nov 28, 2024
eea986f
cmake : fix ARM feature detection (#10543)
ggerganov Nov 28, 2024
76b27d2
ggml : fix row condition for i8mm kernels (#10561)
ggerganov Nov 28, 2024
e90688e
ci : fix tag name in cuda and hip releases (#10566)
slaren Nov 28, 2024
7281cf1
docs: fix outdated usage of llama-simple (#10565)
rand-fly Nov 28, 2024
8907193
common: fix warning message when no GPU found (#10564)
JohannesGaessler Nov 28, 2024
6c59567
server : (tests) don't use thread for capturing stdout/stderr, bump o…
ngxson Nov 28, 2024
4c0a95b
llama : add missing model types
ggerganov Nov 28, 2024
dc22344
ggml : remove redundant copyright notice + update authors
ggerganov Nov 28, 2024
678d799
llava: return false instead of exit (#10546)
tinglou Nov 29, 2024
f095a64
vulkan: get the first command buffer submitted sooner (#10499)
jeffbolznv Nov 29, 2024
938f608
CANN: RoPE operator optimization (#10563)
noemotiovon Nov 29, 2024
266b851
sycl : Reroute permuted mul_mats through oneMKL (#10408)
Alcpz Nov 29, 2024
0f77aae
sycl : offload of get_rows set to 0 (#10432)
Alcpz Nov 29, 2024
4b3242b
ggml-cpu: fix typo in gemv/gemm iq4_nl_4_4 (#10580)
FanShupei Nov 29, 2024
f0678c5
ggml : fix I8MM Q4_1 scaling factor conversion (#10562)
ggerganov Nov 29, 2024
a3a3048
cleanup UI link list (#10577)
slaren Nov 29, 2024
3a8e9af
imatrix : support combine-only (#10492)
robbiemu Nov 29, 2024
b782e5c
server : add more test cases (#10569)
ngxson Nov 29, 2024
7cc2d2c
ggml : move AMX to the CPU backend (#10570)
slaren Nov 29, 2024
0533e7f
vulkan: Dynamic subgroup size support for Q6_K mat_vec (#10536)
netrunnereve Nov 30, 2024
abadba0
readme : refresh (#10587)
ggerganov Nov 30, 2024
3e0ba0e
readme : remove old badge
ggerganov Nov 30, 2024
0c39f44
ggml-cpu: replace AArch64 NEON assembly with intrinsics in ggml_gemv_…
angt Nov 30, 2024
43957ef
build: update Makefile comments for C++ version change (#10598)
wangqin0 Dec 1, 2024
cf80952
Merge branch 'master' into Temp
apicalshark Dec 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .clang-tidy
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,10 @@ Checks: >
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
performance-*,
portability-*,
-portability-simd-intrinsics,
misc-*,
-misc-const-correctness,
-misc-non-private-member-variables-in-classes,
-misc-no-recursion,
-misc-use-anonymous-namespace,
FormatStyle: none
9 changes: 8 additions & 1 deletion .devops/full-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ ARG BASE_MUSA_DEV_CONTAINER=mthreads/musa:${MUSA_VERSION}-devel-ubuntu${UBUNTU_V

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

# MUSA architecture to build for (defaults to all supported archs)
ARG MUSA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential cmake python3 python3-pip git libcurl4-openssl-dev libgomp1

Expand All @@ -19,7 +22,11 @@ WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
# Use the default MUSA archs if not specified
RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release -j$(nproc) && \
cp build/bin/* .

Expand Down
9 changes: 8 additions & 1 deletion .devops/llama-cli-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,21 @@ ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

# MUSA architecture to build for (defaults to all supported archs)
ARG MUSA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential git cmake

WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
# Use the default MUSA archs if not specified
RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-cli -j$(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;
Expand Down
9 changes: 8 additions & 1 deletion .devops/llama-server-musa.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,21 @@ ARG BASE_MUSA_RUN_CONTAINER=mthreads/musa:${MUSA_VERSION}-runtime-ubuntu${UBUNTU

FROM ${BASE_MUSA_DEV_CONTAINER} AS build

# MUSA architecture to build for (defaults to all supported archs)
ARG MUSA_DOCKER_ARCH=default

RUN apt-get update && \
apt-get install -y build-essential git cmake libcurl4-openssl-dev

WORKDIR /app

COPY . .

RUN cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
# Use the default MUSA archs if not specified
RUN if [ "${MUSA_DOCKER_ARCH}" != "default" ]; then \
export CMAKE_ARGS="-DMUSA_ARCHITECTURES=${MUSA_DOCKER_ARCH}"; \
fi && \
cmake -B build -DGGML_NATIVE=OFF -DGGML_MUSA=ON -DLLAMA_CURL=ON ${CMAKE_ARGS} -DCMAKE_EXE_LINKER_FLAGS=-Wl,--allow-shlib-undefined . && \
cmake --build build --config Release --target llama-server -j$(nproc) && \
mkdir -p /app/lib && \
find build -name "*.so" -exec cp {} /app/lib \;
Expand Down
2 changes: 1 addition & 1 deletion .devops/nix/python-scripts.nix
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ let

# server tests
openai
behave
pytest
prometheus-client
];
in
Expand Down
12 changes: 8 additions & 4 deletions .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ body:
- type: dropdown
id: operating-system
attributes:
label: Which operating systems do you know to be affected?
label: Operating systems
description: Which operating systems do you know to be affected?
multiple: true
options:
- Linux
Expand All @@ -41,14 +42,17 @@ body:
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CPU, CUDA, HIP, Kompute, Metal, Musa, RPC, SYCL, Vulkan]
multiple: true
validations:
required: true
- type: textarea
id: steps_to_reproduce
id: info
attributes:
label: Steps to Reproduce
label: Problem description & steps to reproduce
description: >
Please tell us how to reproduce the bug and any additional information that you think could be useful for fixing it.
Please give us a summary of the problem and tell us how to reproduce it.
If you can narrow down the bug to specific compile flags, that information would be very much appreciated by us.
placeholder: >
I'm trying to compile llama.cpp with CUDA support on a fresh install of Ubuntu and get error XY.
Here are the exact commands that I used: ...
validations:
required: true
Expand Down
15 changes: 9 additions & 6 deletions .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ body:
- type: dropdown
id: operating-system
attributes:
label: Which operating systems do you know to be affected?
label: Operating systems
description: Which operating systems do you know to be affected?
multiple: true
options:
- Linux
Expand All @@ -43,6 +44,8 @@ body:
description: Which GGML backends do you know to be affected?
options: [AMX, BLAS, CPU, CUDA, HIP, Kompute, Metal, Musa, RPC, SYCL, Vulkan]
multiple: true
validations:
required: true
- type: textarea
id: hardware
attributes:
Expand All @@ -55,20 +58,20 @@ body:
- type: textarea
id: model
attributes:
label: Model
label: Models
description: >
Which model at which quantization were you using when encountering the bug?
Which model(s) at which quantization were you using when encountering the bug?
If you downloaded a GGUF file off of Huggingface, please provide a link.
placeholder: >
e.g. Meta LLaMA 3.1 Instruct 8b q4_K_M
validations:
required: false
- type: textarea
id: steps_to_reproduce
id: info
attributes:
label: Steps to Reproduce
label: Problem description & steps to reproduce
description: >
Please tell us how to reproduce the bug and any additional information that you think could be useful for fixing it.
Please give us a summary of the problem and tell us how to reproduce it.
If you can narrow down the bug to specific hardware, compile flags, or command line arguments,
that information would be very much appreciated by us.
placeholder: >
Expand Down
23 changes: 13 additions & 10 deletions .github/ISSUE_TEMPLATE/019-bug-misc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ body:
id: version
attributes:
label: Name and Version
description: Which version of our software are you running? (use `--version` to get a version string)
description: Which version of our software is affected? (You can use `--version` to get a version string.)
placeholder: |
$./llama-cli --version
version: 2999 (42b4109e)
Expand All @@ -24,7 +24,8 @@ body:
- type: dropdown
id: operating-system
attributes:
label: Which operating systems do you know to be affected?
label: Operating systems
description: Which operating systems do you know to be affected?
multiple: true
options:
- Linux
Expand All @@ -33,36 +34,38 @@ body:
- BSD
- Other? (Please let us know in description)
validations:
required: true
required: false
- type: dropdown
id: module
attributes:
label: Which llama.cpp modules do you know to be affected?
multiple: true
options:
- Documentation/Github
- libllama (core library)
- llama-cli
- llama-server
- llama-bench
- llama-quantize
- Python/Bash scripts
- Test code
- Other (Please specify in the next section)
validations:
required: true
required: false
- type: textarea
id: steps_to_reproduce
id: info
attributes:
label: Steps to Reproduce
label: Problem description & steps to reproduce
description: >
Please tell us how to reproduce the bug and any additional information that you think could be useful for fixing it.
Please give us a summary of the problem and tell us how to reproduce it (if applicable).
validations:
required: true
- type: textarea
id: first_bad_commit
attributes:
label: First Bad Commit
description: >
If the bug was not present on an earlier version: when did it start appearing?
If the bug was not present on an earlier version and it's not trivial to track down: when did it start appearing?
If possible, please do a git bisect and identify the exact commit that introduced the bug.
validations:
required: false
Expand All @@ -71,8 +74,8 @@ body:
attributes:
label: Relevant log output
description: >
Please copy and paste any relevant log output, including the command that you entered and any generated text.
If applicable, please copy and paste any relevant log output, including the command that you entered and any generated text.
This will be automatically formatted into code, so no need for backticks.
render: shell
validations:
required: true
required: false
15 changes: 5 additions & 10 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,18 @@ Kompute:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-kompute.h
- ggml/src/ggml-kompute.cpp
- ggml/src/ggml-kompute/**
- README-kompute.md
Apple Metal:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-metal.h
- ggml/src/ggml-metal.cpp
- ggml/src/ggml-metal/**
- README-metal.md
SYCL:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-sycl.h
- ggml/src/ggml-sycl.cpp
- ggml/src/ggml-sycl/**
- docs/backend/SYCL.md
- examples/sycl/**
Expand All @@ -27,8 +26,8 @@ Nvidia GPU:
Vulkan:
- changed-files:
- any-glob-to-any-file:
- ggml/ggml_vk_generate_shaders.py
- ggml/src/ggml-vulkan*
- ggml/include/ggml-vulkan.h
- ggml/src/ggml-vulkan/**
documentation:
- changed-files:
- any-glob-to-any-file:
Expand Down Expand Up @@ -75,11 +74,7 @@ server:
ggml:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml*.h
- ggml/src/ggml*.c
- ggml/src/ggml*.cpp
- ggml/src/ggml*.h
- ggml-cuda/**
- ggml/**
nix:
- changed-files:
- any-glob-to-any-file:
Expand Down
Loading
Loading