Skip to content

This repository contains two major projects that work together to deploy and serve Large Language Models (LLMs) on AWS SageMaker.

Notifications You must be signed in to change notification settings

just4give/llm-sagemaker-fargate-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Private LLM API with AWS SageMaker

This repository contains two major projects that work together to deploy and serve Large Language Models (LLMs) on AWS SageMaker.

Repository Overview

1. Infrastructure (CDK)

The infrastructure component is a CDK-based project that creates and manages the AWS resources, including:

  • Route 53: DNS management with SSL certificates
  • ECS Cluster: Container orchestration for the Model API service
  • DynamoDB: For storing API keys
  • SageMaker Endpoints: For hosting deployed LLMs
  • CloudWatch: Monitoring and logging

Architecture Diagram

2. Model API Application

An Express.js application that serves as a middleware between clients and LLMs deployed to SageMaker endpoints:

  • OpenAI-Compatible API: Drop-in replacement for OpenAI API clients
  • REST API: Endpoints for model inference, management, and configuration
  • Request Transformation: Converts API requests to SageMaker format
  • Response Processing: Processes and formats model responses

Project Structure

  • infra/ - CDK-based infrastructure project that creates AWS resources including Route 53, DynamoDB, ECS cluster for the model API service, SageMaker endpoints, and related components.

  • model-api-app/ - Express.js application that serves LLMs deployed to SageMaker endpoints. This REST API is OpenAI-compatible and handles transformation between API requests and the SageMaker format.

Prerequisites

  • AWS CLI
  • AWS CDK CLI
  • Docker
  • Node.js 20.x or later
  • AWS Account and credentials configured
  • Increase service quota for Sagemaker Endpoints instance type ( ml.g6.12xlarge )

Getting Started

Infrastructure Deployment

  1. Prepare environment variables
cp .env.example .env
  1. Update environment variables
APP_NAME="llm-sagemaker"
HUGGINGFACE_TOKEN="your-huggingface-token"
VPC_ID="your-vpc-id"
SUBNET_TYPE="PRIVATE" # PUBLIC or PRIVATE
DOMAIN_NAME="your-domain-name.com"
  1. Navigate to the infrastructure directory:
cd infra
  1. Install dependencies:
npm install
  1. Deploy to AWS:
npm run deploy:bootstrap

Grab the API Key

During deployment, the API key will be generated and stored in the DynamoDB table. This will be printed in the console. Copy the API key and save it for later use.

Swagger UI

The Swagger UI is available at https://genai.<your-domain-name>/api-docs/

How to use the API

The API is Open AI compatible. You can use it as a drop-in replacement for OpenAI API clients. There are few examples under examples folder. For example:

pip install openai
import PIL.Image as Image
import base64
from openai import OpenAI


client = OpenAI(
    base_url = "https://genai.<your-domain-name>/v1",
    api_key="your-api-key",
   )


if __name__ == "__main__":
    image_path = "../ingredients.png"
    model = "llama3-2-11b"
    instruction = "What are the ingredients in this image?"
    with open(image_path, "rb") as image_file:
            image_buffer = image_file.read()

    base64_image = base64.b64encode(image_buffer).decode('utf-8')

    completion = client.chat.completions.create(
            model=model,
            temperature=0,
            top_p=0.90,
            max_tokens=1024,
            messages=[
            {
                "role": "user",
                "content": [
                {
                    "type": "text",
                    "text": instruction
                },
                {
                    "type": "image_url",
                    "image_url": {
                    "url": f"data:image/png;base64,{base64_image}",
                    "detail": "auto"
                    },
                },
                ],
            },
            ],
    )
    print(completion.choices[0].message.content)
    print(completion.usage)

Running the Model API Locally

  1. Navigate to the model API directory:
cd model-api-app
  1. Install dependencies:
npm install
  1. Start the API server:
npm run dev

API Documentation

The Model API provides an OpenAI-compatible interface for interacting with LLMs deployed to SageMaker.

Full API documentation is available via Swagger UI:

https://yourdomain.com/api-docs/

About

This repository contains two major projects that work together to deploy and serve Large Language Models (LLMs) on AWS SageMaker.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published