Skip to content

Conversation

hongkongkiwi
Copy link

Summary

  • Adds load_from_splits() method to LlamaModel for loading models split across multiple files
  • Adds split_path() utility function to generate standardized split file paths
  • Adds split_prefix() utility function to extract prefix from split file paths
  • Includes comprehensive example demonstrating split model usage

Features Added

  • Split Model Loading: Load large models split across multiple GGUF files
  • Path Utilities: Helper functions for working with split file naming conventions
  • Complete Example: Working example with command-line interface and auto-detection

API Changes

  • LlamaModel::load_from_splits(&backend, &[impl AsRef<Path>], &params) -> Result<Self, LlamaModelLoadError>
  • LlamaModel::split_path(prefix: &str, split_no: i32, split_count: i32) -> String
  • LlamaModel::split_prefix(split_path: &str, split_no: i32, split_count: i32) -> Option<String>

Test Plan

  • Builds successfully with no compilation errors
  • API matches upstream llama.cpp functions
  • Example compiles and runs without warnings
  • All documentation passes linting checks

Technical Details

The implementation uses the underlying llama_model_load_from_splits, llama_split_path, and llama_split_prefix functions from llama.cpp, providing safe Rust wrappers with proper error handling and memory management.

🤖 Generated with Claude Code

hongkongkiwi and others added 3 commits August 23, 2025 16:26
This commit introduces comprehensive support for loading models from multiple split files:

- Added `load_from_splits()` method to LlamaModel for loading models split across multiple files
- Added utility functions `split_path()` and `split_prefix()` for working with split file naming conventions
- Added split_model example demonstrating usage of the split loading functionality
- Updated workspace Cargo.toml to include the new split_model example

This feature enables loading very large models that have been split due to filesystem
limitations or distribution requirements.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Remove unused Path import from split_model example
- Remove RPC example from workspace members on split-model-loading branch

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Added documentation comments for RopeType enum variants
- Ensures all public APIs are properly documented

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@hongkongkiwi
Copy link
Author

This bascially adds in missing functionlaity into the rust library that's present in llama.cpp in the tools dir.

I think it's worth keeping feature parity so all llama.cpp features can be used.

…r split-model-loading

This commit adds the scalable dynamic tools building system to the split-model-loading branch:

- Adds generate_tools_cmake() function to dynamically create tools/CMakeLists.txt
- Only builds tools for enabled features (solving PR utilityai#806 issue)
- Split model loading doesn't require tools but maintains architecture consistency
- Includes tools/CMakeLists.txt in Cargo.toml for build system compatibility
- Uses feature-based conditional compilation for future extensibility

This creates a merge-friendly architecture where each feature branch can extend
tool building without conflicts.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@MarcusDunn
Copy link
Contributor

this seems to include #810, please separate these PRs properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants