In this repo you'll learn how to use AWS Trainium and AWS Inferentia with Amazon SageMaker and Hugging Face Optimum Neuron, to optimize your ML workloads! Here you find workshops, tutorials, blog post content, etc. you can use to learn and inspire your own solution.
The content you find here is focused on particular use cases. If you're looking for standalone model samples for inference and training, please check this other repo: https://github.com/aws-neuron/aws-neuron-samples.
| Title | |
|---|---|
| Fine-tune and deploy LLM from Hugging Face on AWS Trainium and AWS Inferentia | Learn how to create a spam classifier that can be easily integrated to your own application |
| Adapting LLMs for domain-aware applications with AWS Trainium post-training | Learn how to adapt a pre-trained model to your own business needs and add a conversational interface your customers can interact with |
| Building Custom Accelerator Kernels with AWS Neuron Kernel Interface (NKI) | Learn how to use the Neuron Kernel Interface (NKI) to write kernels for Neuron accelerators |
| Learn how to reduce costs with AWS Inferentia chips when serving Automatic Speech Recognition models | Learn how to use HuggingFace Optimum Neuron and Manual Porting approaches to compile ASR models and how to deploy them in AWS Inferentia chips. |
These workshops are supported by AWS Workshop Studio
| Description |
|---|
| Llama3-8B Deployment on AWS Inferentia 2 with Amazon EKS and vLLM |
If you have questions, comments, suggestions, etc. please feel free to cut tickets in this repo.
Also, please refer to the CONTRIBUTING document for further details on contributing to this repository.