Skip to content

Sonnet Scripts are a collection of pre-built data architecture patterns that you can quickly spin up on a local machine, along with examples of real-world data that you can use with it.

License

Notifications You must be signed in to change notification settings

onthemarkdata/sonnet-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

98 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚠️ Unstable: Project Still Under Development ⚠️

Sonnet Scripts

Sonnet Scripts is a collection of pre-built data architecture patterns that you can quickly spin up on a local machine, along with examples of real-world data that you can use with it.

Why was Sonnet Scripts created?

One of the challenges of making content and tutorials on data is the lack of established data infrastructure and real-world datasets. I have often found myself repeating this process over and over again, therefore we decided to create an open-source repo to expedite this process.

Why sonnets?

According to the Academy of American Poets, a "...sonnet is a fourteen-line poem written in iambic pentameter, employing one of several rhyme schemes, and adhering to a tightly structured thematic organization." Through the constraints of a particular sonnet format, poets throughout centuries have pushed their creativity to express themselves-- William Shakespear being one of the most well-known. I've similarly seen data architectures fill the same role as a sonnet, where their specific patterns push data practioners to think of creative ways to solve business problems.

How to use Sonnet Scripts

πŸ— Sonnet Scripts - Data & Analytics Sandbox

Introduction

Welcome to Sonnet Scripts – a fully containerized environment designed for data analysts, analytics engineers, and data engineers to experiment with databases, queries, and ETL pipelines. This repository provides a pre-configured sandbox where users can ingest data, transform it using SQL/Python, and test integrations with PostgreSQL, DuckDB, MinIO and more!

Who is this for?

This project is ideal for:

  • Data Engineers who want a lightweight environment for testing data pipelines.
  • Analytics Engineers experimenting with dbt and SQL transformations.
  • Data Analysts looking for a structured PostgreSQL + DuckDB setup.
  • Developers working on data APIs using Python.

πŸ›  Prerequisites

Before setting up the environment, ensure you have the following installed:

  1. Docker & Docker Compose

  2. Make (for automation)

    • Linux/macOS: Comes pre-installed
    • Windows: Install via Chocolatey β†’ choco install make
  3. Python (3.12+)


πŸš€ Quick Start

1️⃣ Clone the Repository

git clone https://github.com/onthemarkdata/sonnet-scripts.git
cd sonnet-scrips

2️⃣ Start the Environment

make setup

This will:

  • Build the Docker images
  • Start the PostgreSQL, DuckDB, and other containers
  • Ensure dependencies are installed

3️⃣ Load Sample Data

make load-db

4️⃣ Verify Data Loaded into Database

make verify-db

5️⃣ Run Tests

make test

6️⃣ Access the PythonBase Command Line Interface (CLI)

make exec-pythonbase

7️⃣ Access the PostgreSQL Database

make exec-postgres

8️⃣ Access the DuckDB CLI

make exec-duckdb

9️⃣ Access the Pipeline Container CLI

make exec-pipelinebase

πŸ”„ Data Pipeline Commands

Export Data from PostgreSQL to MinIO

make load-db-postgres-to-minio

This command:

  • Exports a sample of data from PostgreSQL to CSV
  • Transfers the CSV to the pipelinebase container
  • Converts the CSV to Parquet and uploads to MinIO
  • Cleans up temporary files

Import Data from MinIO to DuckDB

make load-db-minio-to-duckdb

Check MinIO Status and Contents

make check-minio

Verify Data in DuckDB

make check-duckdb

Run the Complete Data Pipeline

make run-all-data-pipelines

This runs the entire ETL process from PostgreSQL to MinIO to DuckDB.

🧹 Environment Management

Stop All Containers

make stop

Rebuild Containers

make rebuild

Complete Rebuild (Clean)

make rebuild-clean

This removes all containers, volumes, and images before rebuilding from scratch.

Check Container Status

make status

View Container Logs

make logs

For a specific container: make logs c=container_name

πŸ“œ Project Structure

πŸ“‚ sonnet-scripts
│── πŸ“‚ pythonbase/         # Python-based processing container
│── πŸ“‚ pipelinebase/       # ETL pipeline and data ingest container
│── πŸ“‚ linuxbase/          # Base container for Linux dependencies
│── πŸ“‚ jupyterbase/        # Jupyter container for analytics and data science
│── 🐳 docker-compose.yml  # Container orchestration
│── πŸ›  Makefile            # Automation commands
│── πŸ“œ README.md           # You are here!

πŸ›  CI/CD Pipeline

Github Actions automates builds, test, and environment validation. The pipeline:

  1. Builds Docker images (pythonbase, linuxbase)
  2. Starts all services using docker compose
  3. Runs unit & integration tests (make test)
  4. Shuts down containers after test pass.

βœ… CI is triggered on:

  • Push to main or feature/*
  • Pull Requests to main

🀝 Contributing

Want to improve Sonnet Scripts? Here's how:

  1. Fork the repository
  2. Make your changes and test them locally
  3. Submit a pull request (PR) for review

For major changes, please open an issue first to discuss your proposal.

We follow Conventional Commits for all commit messages.

πŸ“§ Support & Questions

Maintained by:

If you have questions or encounter issues, feel free to:

  • Open a GitHub issue
  • Contact directly via LinkedIn
  • COMING SOON: Join our Discord community

πŸš€ Happy data wrangling!

About

Sonnet Scripts are a collection of pre-built data architecture patterns that you can quickly spin up on a local machine, along with examples of real-world data that you can use with it.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •