This repository contains two major projects that work together to deploy and serve Large Language Models (LLMs) on AWS SageMaker.
The infrastructure component is a CDK-based project that creates and manages the AWS resources, including:
- Route 53: DNS management with SSL certificates
- ECS Cluster: Container orchestration for the Model API service
- DynamoDB: For storing API keys
- SageMaker Endpoints: For hosting deployed LLMs
- CloudWatch: Monitoring and logging
An Express.js application that serves as a middleware between clients and LLMs deployed to SageMaker endpoints:
- OpenAI-Compatible API: Drop-in replacement for OpenAI API clients
- REST API: Endpoints for model inference, management, and configuration
- Request Transformation: Converts API requests to SageMaker format
- Response Processing: Processes and formats model responses
-
infra/ - CDK-based infrastructure project that creates AWS resources including Route 53, DynamoDB, ECS cluster for the model API service, SageMaker endpoints, and related components.
-
model-api-app/ - Express.js application that serves LLMs deployed to SageMaker endpoints. This REST API is OpenAI-compatible and handles transformation between API requests and the SageMaker format.
- AWS CLI
- AWS CDK CLI
- Docker
- Node.js 20.x or later
- AWS Account and credentials configured
- Increase service quota for Sagemaker Endpoints instance type ( ml.g6.12xlarge )
- Prepare environment variables
cp .env.example .env
- Update environment variables
APP_NAME="llm-sagemaker"
HUGGINGFACE_TOKEN="your-huggingface-token"
VPC_ID="your-vpc-id"
SUBNET_TYPE="PRIVATE" # PUBLIC or PRIVATE
DOMAIN_NAME="your-domain-name.com"
- Navigate to the infrastructure directory:
cd infra
- Install dependencies:
npm install
- Deploy to AWS:
npm run deploy:bootstrap
During deployment, the API key will be generated and stored in the DynamoDB table. This will be printed in the console. Copy the API key and save it for later use.
The Swagger UI is available at https://genai.<your-domain-name>/api-docs/
The API is Open AI compatible. You can use it as a drop-in replacement for OpenAI API clients. There are few examples under examples folder. For example:
pip install openai
import PIL.Image as Image
import base64
from openai import OpenAI
client = OpenAI(
base_url = "https://genai.<your-domain-name>/v1",
api_key="your-api-key",
)
if __name__ == "__main__":
image_path = "../ingredients.png"
model = "llama3-2-11b"
instruction = "What are the ingredients in this image?"
with open(image_path, "rb") as image_file:
image_buffer = image_file.read()
base64_image = base64.b64encode(image_buffer).decode('utf-8')
completion = client.chat.completions.create(
model=model,
temperature=0,
top_p=0.90,
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "text",
"text": instruction
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}",
"detail": "auto"
},
},
],
},
],
)
print(completion.choices[0].message.content)
print(completion.usage)
- Navigate to the model API directory:
cd model-api-app
- Install dependencies:
npm install
- Start the API server:
npm run dev
The Model API provides an OpenAI-compatible interface for interacting with LLMs deployed to SageMaker.
Full API documentation is available via Swagger UI: