How to reduce costs and improve performance of your Machine Learning (ML) workloads?

In this repo you'll learn how to use AWS Trainium and AWS Inferentia with Amazon SageMaker and Hugging Face Optimum Neuron, to optimize your ML workloads! Here you find workshops, tutorials, blog post content, etc. you can use to learn and inspire your own solution.

The content you find here is focused on particular use cases. If you're looking for standalone model samples for inference and training, please check this other repo: https://github.com/aws-neuron/aws-neuron-samples.

Workshops

Title
Fine-tune and deploy LLM from Hugging Face on AWS Trainium and AWS Inferentia	Learn how to create a spam classifier that can be easily integrated to your own application
Adapting LLMs for domain-aware applications with AWS Trainium post-training	Learn how to adapt a pre-trained model to your own business needs and add a conversational interface your customers can interact with
Building Custom Accelerator Kernels with AWS Neuron Kernel Interface (NKI)	Learn how to use the Neuron Kernel Interface (NKI) to write kernels for Neuron accelerators
Learn how to reduce costs with AWS Inferentia chips when serving Automatic Speech Recognition models	Learn how to use HuggingFace Optimum Neuron and Manual Porting approaches to compile ASR models and how to deploy them in AWS Inferentia chips.

These workshops are supported by AWS Workshop Studio

Tutorials

Description
inf1 - Extract embeddings from raw text
inf1 - Track objects in video streaming using CV
inf1 - Create a closed question Q&A model
inf2 - Generate images using SD
inf1 - Answer questions given a context
trn1 - Fine-tune a LLM using distributed training
inf2 - Deploy a LLM to HF TGI
inf2 - Porting BART for Multi-Genre Natural Language Inference
inf2 - Run Qwen models with NeuronX Distributed Inference

Blog posts content

Description
Llama3-8B Deployment on AWS Inferentia 2 with Amazon EKS and vLLM

Contributing

If you have questions, comments, suggestions, etc. please feel free to cut tickets in this repo.

Also, please refer to the CONTRIBUTING document for further details on contributing to this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
blogs/01_LLama3-8B_Inferentia_EKS_vLLM		blogs/01_LLama3-8B_Inferentia_EKS_vLLM
tutorials		tutorials
workshops		workshops
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TUTORIALS.md		TUTORIALS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

How to reduce costs and improve performance of your Machine Learning (ML) workloads?

Workshops

Tutorials

Blog posts content

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

Uh oh!

License

Uh oh!

aws-samples/ml-specialized-hardware

Folders and files

Latest commit

History

Repository files navigation

How to reduce costs and improve performance of your Machine Learning (ML) workloads?

Workshops

Tutorials

Blog posts content

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages