Skip to content
This repository was archived by the owner on Sep 23, 2025. It is now read-only.
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
3de6016
mv path
KepingYan Feb 8, 2024
4c95129
modify import path
KepingYan Feb 8, 2024
19d7fc2
modify package name
KepingYan Feb 8, 2024
c1394d4
update path
KepingYan Feb 20, 2024
ad9a55a
update
KepingYan Feb 20, 2024
4724b44
disable mpt-7b-bigdl
KepingYan Feb 21, 2024
3024036
update
KepingYan Feb 21, 2024
188e495
update for ui
KepingYan Feb 21, 2024
4a63f34
modify llmonray to llm_on_ray
KepingYan Feb 23, 2024
ac3cb59
simply execution command
KepingYan Feb 23, 2024
c70c0ff
merge main branch
KepingYan Feb 23, 2024
0d36d0e
test
KepingYan Feb 23, 2024
961c176
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 23, 2024
f83ee7d
test
KepingYan Feb 23, 2024
1916e20
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 23, 2024
0219eeb
modify
KepingYan Feb 23, 2024
32e990e
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 26, 2024
77055bd
fix
KepingYan Feb 26, 2024
91a8429
update & disable vllm tempeorary
KepingYan Feb 26, 2024
ce2f019
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Feb 26, 2024
f8c59d3
test
KepingYan Feb 27, 2024
a6f1db6
test
KepingYan Feb 27, 2024
61731b9
test
KepingYan Feb 27, 2024
c50ffea
recover
KepingYan Feb 27, 2024
883c9eb
update
KepingYan Feb 27, 2024
6a07499
fix vllm
KepingYan Feb 28, 2024
4a16df0
update
KepingYan Feb 29, 2024
fd2e56e
merge main branch
KepingYan Feb 29, 2024
43af195
move mllm path
KepingYan Feb 29, 2024
93c4918
modify
KepingYan Mar 5, 2024
5933105
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Mar 5, 2024
e51b244
fix err
KepingYan Mar 5, 2024
e055d98
remove import_all_modules
KepingYan Mar 6, 2024
3b0a89b
Merge remote-tracking branch 'upstream/main' into fix_package_path
KepingYan Mar 6, 2024
af9a299
Update .github/workflows/workflow_finetune.yml
xwu-intel Mar 7, 2024
a2e6c57
add comment
KepingYan Mar 7, 2024
c741c01
add comment
KepingYan Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions .github/workflows/workflow_finetune.yml
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ jobs:
docker exec "finetune" bash -c "source \$(python -c 'import oneccl_bindings_for_pytorch as torch_ccl;print(torch_ccl.cwd)')/env/setvars.sh; RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 ray start --head --node-ip-address 127.0.0.1 --ray-debugger-external; RAY_SERVE_ENABLE_EXPERIMENTAL_STREAMING=1 ray start --address='127.0.0.1:6379' --ray-debugger-external"
CMD=$(cat << EOF
import yaml
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['base_model'] = "${{ matrix.model }}"
Expand Down Expand Up @@ -113,14 +113,14 @@ jobs:
EOF
)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"

- name: Run PEFT-LoRA Test
run: |
docker exec "finetune" bash -c "rm -rf /tmp/llm-ray/*"
CMD=$(cat << EOF
import yaml
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['lora_config'] = {
Expand All @@ -138,7 +138,7 @@ jobs:
EOF
)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"

- name: Run Deltatuner Test on DENAS-LoRA Model
run: |
Expand All @@ -150,7 +150,7 @@ jobs:
import os
import yaml
os.system("cp -r $(python -m pip show deltatuner | grep Location | cut -d: -f2)/deltatuner/conf/best_structure examples/")
conf_path = "finetune/finetune.yaml"
conf_path = "llm_on_ray/finetune/finetune.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['General']['lora_config'] = {
Expand All @@ -168,7 +168,7 @@ jobs:
yaml.dump(result, output, sort_keys=False)
EOF)
docker exec "finetune" python -c "$CMD"
docker exec "finetune" bash -c "python finetune/finetune.py --config_file finetune/finetune.yaml"
docker exec "finetune" bash -c "llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml"
fi

- name: Stop Ray
Expand Down
28 changes: 11 additions & 17 deletions .github/workflows/workflow_inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,14 +118,14 @@ jobs:
CMD=$(cat << EOF
import yaml
if ("${{ matrix.model }}" == "starcoder"):
conf_path = "inference/models/starcoder.yaml"
conf_path = "llm_on_ray/inference/models/starcoder.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
with open(conf_path, 'w') as output:
yaml.dump(result, output, sort_keys=False)
if ("${{ matrix.model }}" == "llama-2-7b-chat-hf"):
conf_path = "inference/models/llama-2-7b-chat-hf.yaml"
conf_path = "llm_on_ray/inference/models/llama-2-7b-chat-hf.yaml"
with open(conf_path, encoding="utf-8") as reader:
result = yaml.load(reader, Loader=yaml.FullLoader)
result['model_description']["config"]["use_auth_token"] = "${{ env.HF_ACCESS_TOKEN }}"
Expand All @@ -135,11 +135,11 @@ jobs:
)
docker exec "${TARGET}" python -c "$CMD"
if [[ ${{ matrix.model }} == "mpt-7b-bigdl" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file inference/models/bigdl/mpt-7b-bigdl.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file llm_on_ray/inference/models/bigdl/mpt-7b-bigdl.yaml --simple"
elif [[ ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/llama-2-7b-chat-hf-vllm-fp32.yaml --simple"
else
docker exec "${TARGET}" bash -c "python inference/serve.py --simple --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --simple --models ${{ matrix.model }}"
fi
echo Non-streaming query:
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
Expand All @@ -150,7 +150,7 @@ jobs:
if: ${{ matrix.dtuner_model }}
run: |
TARGET=${{steps.target.outputs.target}}
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/mpt_deltatuner.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/mpt_deltatuner.yaml --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"

Expand All @@ -160,8 +160,8 @@ jobs:
if [[ ${{ matrix.model }} =~ ^(gpt2|falcon-7b|starcoder|mpt-7b.*)$ ]]; then
echo ${{ matrix.model }} is not supported!
elif [[ ! ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python .github/workflows/config/update_inference_config.py --config_file inference/models/\"${{ matrix.model }}\".yaml --output_file \"${{ matrix.model }}\".yaml.deepspeed --deepspeed"
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file \"${{ matrix.model }}\".yaml.deepspeed --simple"
docker exec "${TARGET}" bash -c "python .github/workflows/config/update_inference_config.py --config_file llm_on_ray/inference/models/\"${{ matrix.model }}\".yaml --output_file \"${{ matrix.model }}\".yaml.deepspeed --deepspeed"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file \"${{ matrix.model }}\".yaml.deepspeed --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
fi
Expand All @@ -173,7 +173,7 @@ jobs:
if [[ ${{ matrix.model }} =~ ^(gpt2|falcon-7b|starcoder|mpt-7b.*)$ ]]; then
echo ${{ matrix.model }} is not supported!
else
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file .github/workflows/config/mpt_deltatuner_deepspeed.yaml --simple"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file .github/workflows/config/mpt_deltatuner_deepspeed.yaml --simple"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_simple/query_single.py --model_endpoint http://127.0.0.1:8000/${{ matrix.model }} --streaming_response"
fi
Expand All @@ -182,9 +182,9 @@ jobs:
run: |
TARGET=${{steps.target.outputs.target}}
if [[ ${{ matrix.model }} == "mpt-7b-bigdl" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --config_file inference/models/bigdl/mpt-7b-bigdl.yaml"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --config_file llm_on_ray/inference/models/bigdl/mpt-7b-bigdl.yaml"
elif [[ ! ${{ matrix.model }} == "llama-2-7b-chat-hf-vllm" ]]; then
docker exec "${TARGET}" bash -c "python inference/serve.py --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "llm_on_ray-serve --models ${{ matrix.model }}"
docker exec "${TARGET}" bash -c "python examples/inference/api_server_openai/query_http_requests.py --model_name ${{ matrix.model }}"
fi

Expand All @@ -202,9 +202,3 @@ jobs:
TARGET=${{steps.target.outputs.target}}
cid=$(docker ps -q --filter "name=${TARGET}")
if [[ ! -z "$cid" ]]; then docker stop $cid && docker rm $cid; fi

- name: Test Summary
run: echo "to be continued"



8 changes: 4 additions & 4 deletions .github/workflows/workflow_orders_on_merge.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ on:
paths:
- '.github/**'
- 'docker/**'
- 'common/**'
- 'dev/docker/**'
- 'finetune/**'
- 'inference/**'
- 'rlhf/**'
- 'llm_on_ray/common/**'
- 'llm_on_ray/finetune/**'
- 'llm_on_ray/inference/**'
- 'llm_on_ray/rlhf/**'
- 'tools/**'
- 'pyproject.toml'
- 'tests/**'
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/workflow_orders_on_pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ on:
paths:
- '.github/**'
- 'docker/**'
- 'common/**'
- 'dev/docker/**'
- 'finetune/**'
- 'inference/**'
- 'rlhf/**'
- 'llm_on_ray/common/**'
- 'llm_on_ray/finetune/**'
- 'llm_on_ray/inference/**'
- 'llm_on_ray/rlhf/**'
- 'tools/**'
- 'pyproject.toml'
- 'tests/**'
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
rev: v0.0.289
hooks:
- id: ruff
args: [ --fix, --exit-non-zero-on-fix, --ignore=E402, --ignore=E501, --ignore=E731]
args: [ --fix, --exit-non-zero-on-fix, --ignore=E402, --ignore=E501, --ignore=E731, --ignore=F401]

# Black needs to be ran after ruff with --fix
- repo: https://github.com/psf/black
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ LLM-on-Ray's modular workflow structure is designed to comprehensively cater to
This guide will assist you in setting up LLM-on-Ray on Intel CPU locally, covering the initial setup, finetuning models, and deploying them for serving.
### Setup

#### 1. Clone the repository and install dependencies.
#### 1. Clone the repository, install llm-on-ray and its dependencies.
Software requirement: Git and Conda
```bash
git clone https://github.com/intel/llm-on-ray.git
Expand All @@ -62,14 +62,14 @@ ray start --head
Use the following command to finetune a model using an example dataset and default configurations. The finetuned model will be stored in `/tmp/llm-ray/output` by default. To customize the base model, dataset and configurations, please see the [finetuning document](#finetune):

```bash
python finetune/finetune.py --config_file finetune/finetune.yaml
llm_on_ray-finetune --config_file llm_on_ray/finetune/finetune.yaml
```

### Serving
Deploy a model on Ray and expose an endpoint for serving. This command uses GPT2 as an example, but more model configuration examples can be found in the [inference/models](inference/models) directory:

```bash
python inference/serve.py --config_file inference/models/gpt2.yaml
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml
```

The default served method is to provide an OpenAI-compatible API server ([OpenAI API Reference](https://platform.openai.com/docs/api-reference/chat)), you can access and test it in many ways:
Expand All @@ -95,7 +95,7 @@ python examples/inference/api_server_openai/query_openai_sdk.py
```
Or you can serve specific model to a simple endpoint according to the `port` and `route_prefix` parameters in configuration file,
```bash
python inference/serve.py --config_file inference/models/gpt2.yaml --simple
llm_on_ray-serve --config_file llm_on_ray/inference/models/gpt2.yaml --simple
```
After deploying the model endpoint, you can access and test it by using the script below:
```bash
Expand Down
23 changes: 0 additions & 23 deletions common/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/agentenv/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/dataprocesser/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/dataset/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/initializer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/model/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/optimizer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/tokenizer/__init__.py

This file was deleted.

9 changes: 0 additions & 9 deletions common/trainer/__init__.py

This file was deleted.

3 changes: 2 additions & 1 deletion dev/docker/Dockerfile.bigdl-cpu
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
COPY ./pyproject.toml .
COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference
# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[bigdl-cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
3 changes: 2 additions & 1 deletion dev/docker/Dockerfile.cpu_and_deepspeed
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ RUN --mount=type=cache,target=/opt/conda/pkgs conda init bash && \
COPY ./pyproject.toml .
COPY ./MANIFEST.in .

RUN mkdir ./finetune && mkdir ./inference
# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu,deepspeed] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
3 changes: 2 additions & 1 deletion dev/docker/Dockerfile.vllm
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@ COPY ./pyproject.toml .
COPY ./MANIFEST.in .
COPY ./dev/scripts/install-vllm-cpu.sh .

RUN mkdir ./finetune && mkdir ./inference
# create llm_on_ray package directory to bypass the following 'pip install -e' command
RUN mkdir ./llm_on_ray

RUN --mount=type=cache,target=/root/.cache/pip pip install -e .[cpu] --extra-index-url https://download.pytorch.org/whl/cpu \
--extra-index-url https://pytorch-extension.intel.com/release-whl/stable/cpu/us/
Expand Down
2 changes: 1 addition & 1 deletion docs/finetune.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,5 @@ The following models have been verified on Intel CPUs or GPUs.
## Finetune the model
To finetune your model, execute the following command. The finetuned model will be saved in /tmp/llm-ray/output by default.
``` bash
python finetune/finetune.py --config_file <your finetuning conf file>
llm_on_ray-finetune --config_file <your finetuning conf file>
```
20 changes: 10 additions & 10 deletions docs/pretrain.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,28 +122,28 @@ Set up `megatron_deepspeed_path` in the configuration.

```bash
cd /home/user/workspace/llm-on-ray
#Bloom-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/bloom_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
#llama-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/llama_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
# Bloom-7B
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/bloom_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
# llama-7B
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/llama_7b_megatron_deepspeed_zs0_8Gaudi_pretrain.conf
```

##### Huggingface Trainer
```bash
cd /home/user/workspace/llm-on-ray
#llama-7B
python pretrain/pretrain.py --config_file pretrain/config/llama_7b_8Guadi_pretrain.conf
# llama-7B
llm_on_ray-pretrain --config_file llm_on_ray/pretrain/config/llama_7b_8Guadi_pretrain.conf
```
##### Nvidia GPU:
###### Megatron-DeepSpeed
```bash
cd /home/user/workspace/llm-on-ray
#llama2-7B
python pretrain/megatron_deepspeed_pretrain.py --config_file pretrain/config/llama2_3b_megatron_deepspeed_zs0_8gpus_pretrain.conf
# llama2-7B
llm_on_ray-megatron_deepspeed_pretrain --config_file llm_on_ray/pretrain/config/llama2_3b_megatron_deepspeed_zs0_8gpus_pretrain.conf
```
##### Huggingface Trainer
```bash
cd /home/user/workspace/llm-on-ray
#llama-7B
python pretrain/pretrain.py --config_file pretrain/config/llama_7b_8gpu_pretrain.conf
# llama-7B
llm_on_ray-pretrain --config_file llm_on_ray/pretrain/config/llama_7b_8gpu_pretrain.conf
```
Loading