Skip to content

Conversation

hipudding
Copy link
Collaborator

  1. Fix precision issue on 310I DUP with multi-devices, P2P transfor is not suite for this device.
  2. Enable NZ format by default.
  3. Enable acl_graph by default.

Make sure to read the contributing guidelines before submitting a PR

@hipudding hipudding self-assigned this Sep 4, 2025
@hipudding hipudding added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Sep 4, 2025
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Sep 4, 2025
Copy link
Collaborator

@noemotiovon noemotiovon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@noemotiovon noemotiovon merged commit 5421f63 into ggml-org:master Sep 4, 2025
90 of 91 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 4, 2025
…upport

* origin/master: (72 commits)
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799)
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791)
CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763)
server: add exceed_context_size_error type (ggml-org#15780)
Document the new max GPU layers default in help (ggml-org#15771)
ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669)
CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)
opencl: add hs=40 to FA (ggml-org#15758)
CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760)
vulkan: fix mmv subgroup16 selection (ggml-org#15775)
vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724)
vulkan : update ggml_vk_instance_validation_ext_available (ggml-org#15666)
ggml vulkan: add hardsigmoid and hardswish operations (ggml-org#15762)
CUDA: Optimize `rms_norm_f32` kernel and its fused variants, giving 1-6% perf E2E (ggml-org#15715)
model-conversion : fix pyright errors (ggml-org#15770)
sampling : optimize dist sampler (ggml-org#15704)
llama : fix incorrect model type for Gemma 270M (ggml-org#15764)
model-conversion : remove hardcoded /bin/bash shebangs [no ci] (ggml-org#15765)
CANN: Add RoPE contiguous check for 310I DUP device (ggml-org#15735)
ggml-cpu : optimize RVV kernels (ggml-org#15720)
...
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request Sep 5, 2025
…g-model-disabled-agent-prefill

* origin/master: (84 commits)
CUDA: fastdiv, launch bounds for mmvq + q8_1 quant (ggml-org#15802)
tests : add --list-ops and --show-coverage options (ggml-org#15745)
gguf: gguf_writer refactor (ggml-org#15691)
kv-cache : fix SWA checks + disable cacheless iSWA (ggml-org#15811)
model-conversion : add --embeddings flag to modelcard.template [no ci] (ggml-org#15801)
chat : fixed crash when Hermes 2 <tool_call> had a newline before it (ggml-org#15639)
chat : nemotron thinking & toolcalling support (ggml-org#15676)
scripts : add Jinja tester PySide6 simple app (ggml-org#15756)
llama : add support for EmbeddingGemma 300m (ggml-org#15798)
metal : Add template specialization for mul_mm_id w/ ne20 == 10 (ggml-org#15799)
llama : set n_outputs to 1 to avoid 0 outputs mean-pooling (ggml-org#15791)
CANN: Refactor ND to NZ workspace to be per-device (ggml-org#15763)
server: add exceed_context_size_error type (ggml-org#15780)
Document the new max GPU layers default in help (ggml-org#15771)
ggml: add ops for WAN video model (cuda && cpu) (ggml-org#15669)
CANN: Fix precision issue on 310I DUO multi-devices (ggml-org#15784)
opencl: add hs=40 to FA (ggml-org#15758)
CANN: fix acl_rstd allocation size in ggml_cann_rms_norm (ggml-org#15760)
vulkan: fix mmv subgroup16 selection (ggml-org#15775)
vulkan: don't use std::string in load_shaders, to improve compile time (ggml-org#15724)
...
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ascend NPU issues specific to Ascend NPUs documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants