Embracing event driven architecture to enhance resilience of data solutions built on Amazon SageMaker
- Content
- Overview
- Target Audience
- Key Features
- Architecture
- Repository Structure
- Getting Started
- Security
- License
- Feedback
This repository contains the AWS CDK code that captures and stores system metadata of the data solution built on Amazon SageMaker at regular interval. This code sample stores asset information of a data solution built on Amazon DataZone domain and Amazon SageMaker Unified Studio domain in an Amazon DynamoDB global table.
The target audience of this solution are data engineers, cloud engineers, cloud architects, and DevOps engineers.
The following diagram displays the data mesh reference architecture based on Amazon DataZone using Amazon S3 and AWS Glue Data Catalog as data source.
-
One active AWS account.
-
AWS administrator credentials for the central governance account in your development environment.
-
AWS Command Line Interface (AWS CLI) installed to manage your AWS services from the command line (Recommended).
-
Node.js and Node Package Manager (npm) installed to manage AWS CDK applications.
npm install -g aws-cdk- TypeScript is installed in your development environment. Install it globally using npm compiler.
npm install -g typescript- Docker is installed in your development environment (Recommended).
This section describes the steps to deploy the data governance solution.
This step provides instructions to set up your development environment. Ensure that the prerequisites are met, before proceeding with this step.
See Set up environment for more details.
In the AWS Account, the following key resources are deployed:
-
Amazon DataZone Domain: The Amazon DataZone domain helps you organize data assets, users, environments and projects in your organization.
-
Amazon DataZone data portal: The Amazon DataZone data portal is a browser-based web application where you can catalog, discover, govern, share, and analyze data in a self-service manner.
-
Amazon DataZone Producer Projects and Environments: The Amazon DataZone data Producer projects and environments are deployed in the Central Governance Account. This solution uses Data Lake blueprint to create the Amazon DataZone environment.
-
AWS IAM Roles for Data Users: The solution deploys AWS IAM roles corresponding to data users of the data mesh. The solution can be extended to SSO users.
-
Amazon DynamoDB Global Table - The Amazon DynamoDB table stores information about DataZone assets for resilience and backup purposes.
-
AWS Step Functions - The AWS Step Function State Machine orchestrates the backup workflow.
See Deploy for DataZone Domain for more details.
In the AWS Account, the following key resources are deployed:
-
Amazon DynamoDB Global Table - The Amazon DynamoDB table stores information about DataZone assets for resilience and backup purposes.
-
AWS Step Functions - The AWS Step Function State Machine orchestrates the backup workflow.
See Deploy for SageMaker Unifed Studio Domain for more details.
Clean up the solution using the following steps.
See Clean up for more details.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.
Have an issue? Please create an issue in the GitHub repository
