Skip to content

Commit 3026747

Browse files
committed
RAG-LLM GitOps Pattern on MS Azure
1 parent 812d93f commit 3026747

File tree

4 files changed

+226
-2
lines changed

4 files changed

+226
-2
lines changed
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
title: RAG-LLM GitOps Pattern on Microsoft Azure
3+
date:
4+
validated: false
5+
tier: sandbox
6+
summary:
7+
rh_products:
8+
- Red Hat OpenShift Container Platform
9+
- Red Hat OpenShift AI
10+
partners:
11+
- Microsoft
12+
industries:
13+
- General
14+
aliases: /azure-rag-llm-gitops/
15+
#pattern_logo:
16+
links:
17+
github: https://github.com/validatedpatterns/rag-llm-gitops
18+
install: getting-started
19+
bugs: https://github.com/validatedpatterns/rag-llm-gitops/issues
20+
feedback: https://docs.google.com/forms/d/e/1FAIpQLScI76b6tD1WyPu2-d_9CCVDr3Fu5jYERthqLKJDUGwqBg7Vcg/viewform
21+
ci: ragllm
22+
---
23+
24+
:toc:
25+
:imagesdir: /images
26+
:_content-type: ASSEMBLY
27+
include::modules/comm-attributes.adoc[]
28+
29+
[id="about-azure-rag-llm-gitops-pattern"]
30+
== About the RAG-LLM GitOps Pattern on Microsoft Azure
31+
32+
The RAG-LLM GitOps Pattern offers a robust and scalable solution for deploying LLM-based applications with integrated retrieval capabilities on Microsoft Azure. By embracing GitOps principles, this pattern ensures automated, consistent, and auditable deployments. It streamlines the setup of complex LLM architectures, allowing users to focus on application development rather than intricate infrastructure provisioning.
33+
34+
[id="solution-elements-and-technologies"]
35+
== Solution elements and technologies
36+
37+
The RAG-LLM GitOps Validated Pattern leverages the following key technologies and components:
38+
39+
* **Azure Red Hat OpenShift (ARO)**: The foundation for container orchestration and application deployment.
40+
* **Azure SQL Server**: The default relational database backend for storing vector embeddings.
41+
* **HuggingFace Models**: Used for both embedding generation and large language model inference.
42+
* **{rh-gitops} (ArgoCD)**: The primary driver for automated deployment and continuous synchronization of the pattern's components.
43+
* **vLLM**: An optimized inference engine for large language models, deployed on GPU-enabled nodes.
Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
---
2+
title: Getting Started
3+
weight: 10
4+
aliases: /getting-started/
5+
---
6+
7+
:toc:
8+
:imagesdir: /images
9+
:_content-type: ASSEMBLY
10+
include::modules/comm-attributes.adoc[]
11+
12+
[id="installing-rag-llm-azure-pattern"]
13+
== Installing the RAG-LLM GitOps Pattern on Microsoft Azure
14+
15+
.Prerequisites
16+
17+
* You are logged into an existing Azure Red Hat OpenShift (ARO) cluster with administrative privileges.
18+
* Your Azure subscription has the required GPU quota to provision the necessary compute resources for the vLLM inference service. The default is Standard_NC8as_T4_v3, which requires at least 8 CPUs.
19+
* A Hugging Face token:
20+
* Database server
21+
** Azure SQL database server - It is the default vector database for deploying the RAG-LLM GitOps Pattern on Azure.
22+
** (Optional) Local databases- You can also deploy Redis, PostgreSQL (EDB), or Elasticsearch (ELASTIC) directly within your cluster. If choosing a local database, ensure that it is provisioned and accessible before deployment.
23+
24+
[IMPORTANT]
25+
====
26+
* To select your database type, edit `overrides/values-Azure.yaml` file.
27+
+
28+
[source,yaml]
29+
----
30+
global:
31+
db:
32+
type: "AZURESQL" # Options: AZURESQL, REDIS, EDB, ELASTIC
33+
----
34+
35+
36+
* When choosing local database instances such as Redis, PostgreSQL, or Elasticsearch, ensure that your cluster has sufficient resources available.
37+
====
38+
39+
[id="overview-of-the-installation-workflow_{context}"]
40+
== Overview of the installation workflow
41+
To install the RAG-LLM GitOps Pattern on Microsoft Azure, you must complete the following setup and configurations:
42+
43+
* xref:creating-huggingface-token[Create a Hugging face token]
44+
* xref:deploying-azure-sql-server[Deploy Azure SQL]
45+
* xref:creating-secret-credentials[Create required secrets]
46+
* xref:provisioning-gpu-nodes[Create GPU nodes]
47+
* xref:deploy-rag-llm-azure-pattern[Install the RAG-LLM GitOps Pattern on Microsoft Azure]
48+
49+
[id="creating-huggingface-token_{context}"]
50+
=== Creating a Hugging Face token
51+
.Procedure
52+
53+
. To obtain a Hugging Face token, navigate to the link:https://huggingface.co/settings/tokens[Hugging Face] site.
54+
. Log in to your account.
55+
. Go to your *Settings* -> *Access Tokens*.
56+
. Create a new token with appropriate permissions. Ensure you accept the terms of the specific model you plan to use, as required by Hugging Face. For example, Mistral-7B-Instruct-v0.3-AWQ
57+
58+
[id="deploying-azure-sql-server_{context}"]
59+
=== Deploying Azure SQL Server
60+
61+
.Procedure
62+
63+
. Navigate to the Azure portal and create a new SQL Database server.
64+
. When prompted for authentication, select `Use SQL authentication`.
65+
. Record the generated *Server name*, *Server admin login*, and *Password*. These credentials will be used later when creating secrets.
66+
. On the *Networking* tab, ensure that *Allow Azure services and resources to access this server is set* to *Yes*. This allows your ARO cluster to connect to the database.
67+
. Click *Review + create*, and then click *Create*.
68+
69+
Wait for the SQL Server deployment to complete and become active before proceeding.
70+
71+
[id="creating-secret-credentials_{context}"]
72+
=== Creating secret credentials
73+
74+
To securely store your sensitive credentials, create a YAML file named `~/values-secret-rag-llm-gitops.yaml`. This file is used during the pattern deployment; however, you must not commit it to your Git repository.
75+
76+
[IMPORTANT]
77+
====
78+
If you’re not using Azure SQL Server, omit the entire `azuresql` section.
79+
====
80+
81+
[source,yaml]
82+
----
83+
# ~/values-secret-rag-llm-gitops.yaml
84+
# Replace placeholders with your actual credentials
85+
version: "2.0"
86+
87+
secrets:
88+
- name: hfmodel
89+
fields:
90+
- name: hftoken <1>
91+
value: <hf_your_huggingface_token>
92+
- name: azuresql
93+
fields:
94+
- name: user <2>
95+
value: <adminuser>
96+
- name: password <3>
97+
value: <your_password>
98+
- name: server <4>
99+
value: <yourservername.database.windows.net>
100+
----
101+
<1> Specify your Hugging Face token.
102+
<2> Specify the username for your Azure SQL server.
103+
<3> Specify the password for your Azure SQL server.
104+
<4> Specify the fully qualified of your Azure SQL server name.
105+
106+
[id="provisioning-gpu-nodes_{context}"]
107+
=== Provisioning GPU nodes
108+
109+
The vLLM inference service requires dedicated GPU nodes with a specific taint. You can provision these nodes by using one of the following methods:
110+
111+
Automatic Provisioning:: The pattern includes capabilities to automatically provision GPU-enabled `MachineSet` resources.
112+
+
113+
Run the following command to create a single Standard_NC8as_T4_v3 GPU node:
114+
+
115+
[source,terminal]
116+
----
117+
./pattern.sh make create-gpu-machineset-azure
118+
----
119+
120+
Customizable Method:: For environments requiring more granular control, you can manually create a `MachineSet` with the necessary GPU instance types and apply the required taint. For more information on creating custom `MachineSet` resources for ARO cluster, see link:https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/machine_management/managing-compute-machines-with-the-machine-api#creating-machineset-azure[Creating a compute machine set on Azure]
121+
+
122+
To control GPU node specifics, provide additional parameters:
123+
+
124+
[source,terminal]
125+
----
126+
./pattern.sh make create-gpu-machineset-azure GPU_REPLICAS=3 OVERRIDE_ZONE=2 GPU_VM_SIZE=Standard_NC16as_T4_v3
127+
----
128+
+
129+
where:
130+
+
131+
- `GPU_REPLICAS` is the umber of GPU nodes to provision.
132+
+
133+
- (Optional): `OVERRIDE_ZONE` is the availability zone .
134+
+
135+
- `GPU_VM_SIZE` is the Azure VM SKU for GPU nodes.
136+
+
137+
The script automatically applies the required taint. The NVIDIA GPU Operator that is installed by the pattern manages the CUDA driver installation on GPU nodes.
138+
139+
[id="deploy-rag-llm-azure-pattern_{context}"]
140+
=== Deploying the RAG-LLM GitOps Pattern
141+
142+
To deploy the RAG-LLM GitOps Pattern to your ARO cluster, run the following command:
143+
144+
[source,terminal]
145+
----
146+
pattern.sh make install
147+
----
148+
149+
This command initiates the GitOps-driven deployment process, which installs and configures all RAG-LLM components on your ARO cluster based on the provided values and secrets.

content/patterns/medical-diagnosis/getting-started.adoc

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@ include::modules/comm-attributes.adoc[]
1414

1515
.Prerequisites
1616

17-
.Prerequisites
18-
1917
* An OpenShift cluster
2018
** To create an OpenShift cluster, go to the https://console.redhat.com/[Red Hat Hybrid Cloud console].
2119
** Select *OpenShift \-> Red Hat OpenShift Container Platform \-> Create cluster*.
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
:_content-type: CONCEPT
2+
:imagesdir: ../../images
3+
4+
[id="configuration-options_{context}"]
5+
= Configuration options
6+
7+
The RAG-LLM GitOps pattern offers extensive configuration options through its Helm chart values, enabling you to tailor the deployment to your specific use case, data sources, and model requirements.
8+
9+
[id="document-sources-for-rag-db-population_{context}"]
10+
== Document sources for RAG DB population
11+
12+
To populate your vector database with relevant documents, you can specify various sources within the pattern's configuration. This is typically managed under the `populateDbJob` section of the Helm values.
13+
14+
* Git Repository Sources (`populateDbJob.repoSources`): Specify documents from Git repositories. You can use glob patterns to include or exclude specific file types from these repositories.
15+
+
16+
[TIP]
17+
====
18+
To optimize retrieval quality and performance, restrict Git repository sources to file types that are suitable for semantic search (e.g., `.txt`, `.md`, `.pdf`, `.json`). Avoid including binary files or irrelevant content that could degrade search accuracy.
19+
====
20+
21+
* Web Page Sources (`populateDbJob.webSources`): Include content directly from specified web pages.
22+
23+
[id="embedding-and-llm-inference-models_{context}"]
24+
== Embedding and LLM inference models
25+
26+
The models used for generating embeddings and performing LLM inference are defined in the `values-global.yaml` file:
27+
28+
* LLM Inference Model:Configured under `global.model.vllm`. This typically specifies the Hugging Face model identifier for the large language model.
29+
* Embedding Model: Configured under `global.model.embedding`. This specifies the Hugging Face model identifier for the text embedding model.
30+
31+
Both models should be compatible with the Hugging Face ecosystem. When deploying in cloud environments such as Azure, carefully consider the VRAM requirements of your chosen models to ensure that your provisioned GPU nodes have sufficient memory for optimal performance and to avoid resource contention.
32+
33+
.Additional resource
34+
* link:https://validatedpatterns.io/blog/2025-06-10-rag-llm-gitops-configuration/[How to Configure the RAG-LLM GitOps Pattern for Your Use Case]

0 commit comments

Comments
 (0)